gedcompy icon indicating copy to clipboard operation
gedcompy copied to clipboard

Add support for Gedcom files starting with BOM

Open BioGeek opened this issue 10 years ago • 1 comments

Sites like geni.com let you export Gedcom files that start with a Byte Order Mark (BOM).

Currently the regex fails for such files and you get a NotImplementedError.

See this detailed article for more about GEDCOM & the Unicode Byte Order Mark.

I'm currently toying with a solution like described here to remove the BOM and encode/decode the string, but I still get strange characters in the output.

BioGeek avatar Dec 23 '14 15:12 BioGeek

I've added some support for BOMs in the new unicode-support branch. It should use a BOM (if present) to use the correct encoding. Can you try it out on files that you have?

There are a few other parts to this task that I haven't done yet:

  • [x] Support BOM
  • [x] Add HEAD.CHARACTER SET head tag
  • [ ] Parse and use the HEAD.CHARACTER SET tag if there is no BOM
  • [ ] Support ANSEL (?!)

amandasaurus avatar Dec 26 '14 21:12 amandasaurus