awka – awk to C converter

I love AWK. I am among a minority. While many people use awk one-liners in shell scripts, few utilize awk as it was originally intended – a stand-alone language. I used to use it extensively back in the 1990s and continued to use it as my go-to language whenever I needed some heavy lifting text file manipulation.

Recently I wrote a small but powerful awk program to manipulate certain fields in genealogy GEDCOM files. My problem was that the GEDCOM standard allows NOTES and SOURCES to be linked, or cross referenced (XREF). For example here are a couple of lines of an “in-line” note:

1 NOTE This file demonstrates all tags that are allowed in GEDCOM 5.5. Here are some comments about the HEADER record
2 CONC and comments about where to look for information on the other 9 types of GEDCOM records. Most other records will

The note immediately follows the NOTE keyword in field 2. The CONT line that follows is a “continuation” line of the preceeding line.

Here is what a cross-linked/XREF note looks like:

1 NOTE @N24@
1 CHAN 2 DATE 11 Jan 2001
3 TIME 16:00:06
1 RIN 1
… (many lines later)
0 @N24@ NOTE
1 CONC Comments on "Charlie Accented ANSEL" INDIVIDUAL Record.
1 CONT
1 CONT To represent accented characters, the ANSEL character set uses two-byte codes. The 
…

Note @N24@ is the cross reference to the actual note @24@ much later (often near the bottom) in the file.

My problem was that, recently, I have been using the genealogy website WikiTree.com and their new GEDCOM import process (GEDCOMpare) has some bugs. Currently it discards cross-linked notes. To resolve this problem I wrote an awk program that converts cross-linked Notes and Sources into In-Line, which mitigates the WikiTree GEDCOM import problem.

Now, with the background out of the way – on to the reason for this post. Awk isn’t blindingly fast since it is an interpreted language so a binary executable of the awk program would help with large GEDCOMs. I stumbled across a SourceForge project by Andrew Sumner named “awka” In addition to being hosted on SourceForge it is also hosted by noyesno of Shanghai, China on GitHub. I compiled and installed it with only one issue – it put its library in /usr/local/lib instead of where I needed it for my system – /usr/lib. In one word AWESOME! Awka creates C source that compiles into an executable that is functionally identical to the subject awk script.

If you’re into awk check out awka! And if you aren’t into awk – GET INTO AWK!

Advertisements
This entry was posted in Programming and tagged , , . Bookmark the permalink.

2 Responses to awka – awk to C converter

  1. Tim McTee says:

    Heads up the Wikipedia GEDCOM page is very bad.
    Tamura Jones has all the GEDCOM specs and corrections:
    https://www.tamurajones.net/FamilySearchGEDCOMSpecifications.xhtml

    Liked by 1 person

    • celem says:

      Tim McTee, Thanks for the suggestion. I have switched the GEDCOM link from Wikipedia to Tamura Jones. I never used the Wikipedia link’s data – I included it simply as a reader’s reference and Wikipedia is usually helpful. Tamura Jones is better reference in that it links to the actual standards.

      Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s