I love AWK. I am among a minority. While many people use awk one-liners in shell scripts, few utilize awk as it was originally intended – a stand-alone language. I used to use it extensively back in the 1990s and continued to use it as my go-to language whenever I needed some heavy lifting text file manipulation.
Recently I wrote a small but powerful awk program to manipulate certain fields in genealogy GEDCOM files. My problem was that the GEDCOM standard allows NOTES and SOURCES to be linked, or cross referenced (XREF). For example here are a couple of lines of an “in-line” note:
1 NOTE This file demonstrates all tags that are allowed in GEDCOM 5.5. Here are some comments about the HEADER record 2 CONC and comments about where to look for information on the other 9 types of GEDCOM records. Most other records will
The note immediately follows the NOTE keyword in field 2. The CONT line that follows is a “continuation” line of the preceeding line.
Here is what a cross-linked/XREF note looks like:
1 NOTE @N24@ 1 CHAN 2 DATE 11 Jan 2001 3 TIME 16:00:06 1 RIN 1 … (many lines later) 0 @N24@ NOTE 1 CONC Comments on "Charlie Accented ANSEL" INDIVIDUAL Record. 1 CONT 1 CONT To represent accented characters, the ANSEL character set uses two-byte codes. The …
Note @N24@ is the cross reference to the actual note @24@ much later (often near the bottom) in the file.
My problem was that, recently, I have been using the genealogy website WikiTree.com and their new GEDCOM import process (GEDCOMpare) has some bugs. Currently it discards cross-linked notes. To resolve this problem I wrote an awk program that converts cross-linked Notes and Sources into In-Line, which mitigates the WikiTree GEDCOM import problem.
Now, with the background out of the way – on to the reason for this post. Awk isn’t blindingly fast since it is an interpreted language so a binary executable of the awk program would help with large GEDCOMs. I stumbled across a SourceForge project by Andrew Sumner named “awka” In addition to being hosted on SourceForge it is also hosted by noyesno of Shanghai, China on GitHub. I compiled and installed it with only one issue – it put its library in /usr/local/lib instead of where I needed it for my system – /usr/lib. In one word AWESOME! Awka creates C source that compiles into an executable that is functionally identical to the subject awk script.
If you’re into awk check out awka! And if you aren’t into awk – GET INTO AWK!