hexml
hexml copied to clipboard
A bad XML parser
They seem to have stopped working - sort them out
A pure Haskell implementation might go faster as it avoids FFI
@thoughtpolice in https://www.reddit.com/r/haskell/comments/5i2mg1/new_xml_parser_hexml/db5os2h/ says: > Also, IMO, the C library could be improved a bit, too e.g. it should probably be namespaced so everything is under hexml_, and I'm not...
We should be able to recode the parser loop as gotos, with zero stack/call usage, and only looking at each character once.
Currently the first thing the Haskell layer does is append `\0` to the string, forcing an entire copy and realloc. That's expensive - can it be avoided? If the last...
Should have an unescape function that undoes entity escapes, e.g. `>` to `>`. May well be written exclusively on the Haskell side.
Should at the very least parse it as plain text, perhaps do more and introduce Comment/CData nodes.
Addresses #1 - Don't fail the parse if we run into a CDATA. - Do the same to CDATA as we do to comments: leave it up to the downstream...