Add support for custom entity definitions in ParsingOptions
I’m using the roxmltree library version 0.18.0 to parse some epub, but I encountered a problem with entity references. Some of the files do not have standard <!ENTITY ...> settings, such as the common ones
.add_entity("nbsp", " ")
.add_entity("copy", "©")
.add_entity("reg", "®")
This causes the parsing to fail. I think it would be more convenient and flexible if the library could provide a way to inject additional entity definitions in the ParsingOptions, instead of modifying the file content.
Is this possible to implement? Thank you for your consideration.
EPUB is HTML, not XML. And HTML parsing is out of scope.
XHTML to be precise, at least for the task I am working on. It works well except for the entity definition part.
I'm not familiar with XHTML, but if it can be handled by generic XML parsers, then sure, we could add user-defined entities support.
I'm not sure what would be the correct way to implement this. Everything in XML is very complicated. Will take a look.