gedcompy
gedcompy copied to clipboard
Add support for Gedcom files starting with BOM
Sites like geni.com let you export Gedcom files that start with a Byte Order Mark (BOM).
Currently the regex fails for such files and you get a NotImplementedError.
See this detailed article for more about GEDCOM & the Unicode Byte Order Mark.
I'm currently toying with a solution like described here to remove the BOM and encode/decode the string, but I still get strange characters in the output.
I've added some support for BOMs in the new unicode-support
branch. It should use a BOM (if present) to use the correct encoding. Can you try it out on files that you have?
There are a few other parts to this task that I haven't done yet:
- [x] Support BOM
- [x] Add
HEAD.CHARACTER SET
head tag - [ ] Parse and use the
HEAD.CHARACTER SET
tag if there is no BOM - [ ] Support ANSEL (?!)