roxmltree icon indicating copy to clipboard operation
roxmltree copied to clipboard

Add support for custom entity definitions in ParsingOptions

Open nihilox opened this issue 2 years ago • 3 comments

I’m using the roxmltree library version 0.18.0 to parse some epub, but I encountered a problem with entity references. Some of the files do not have standard <!ENTITY ...> settings, such as the common ones

.add_entity("nbsp", " ")
.add_entity("copy", "©")
.add_entity("reg", "®")

This causes the parsing to fail. I think it would be more convenient and flexible if the library could provide a way to inject additional entity definitions in the ParsingOptions, instead of modifying the file content.

Is this possible to implement? Thank you for your consideration.

nihilox avatar Sep 11 '23 10:09 nihilox

EPUB is HTML, not XML. And HTML parsing is out of scope.

RazrFalcon avatar Sep 11 '23 13:09 RazrFalcon

XHTML to be precise, at least for the task I am working on. It works well except for the entity definition part.

nihilox avatar Sep 11 '23 16:09 nihilox

I'm not familiar with XHTML, but if it can be handled by generic XML parsers, then sure, we could add user-defined entities support.

I'm not sure what would be the correct way to implement this. Everything in XML is very complicated. Will take a look.

RazrFalcon avatar Sep 11 '23 17:09 RazrFalcon