HTMLp
HTMLp copied to clipboard
HTMLReader handles numeric and named entity references different ways
https://github.com/ange007/HTMLp/blob/4883f9902d88aee29570867530d804649ee15c79/HtmlReader.pas#L497
ReadNumericEntityNode function handles the readings different way than ReadNamedEntityNode. Numeric entity is read as TEXT_NODE and named entities are read as ENTITY_REFERENCE_NODE, also different events are triggered which causes HTMLParser to handle them in separate ways, which may cause problems when parsing HTML. I.e. /</; and /&/#60/; are handled in separate ways.
You guys know if this is intended functionality or not? Does HTML parsing spec state that these has to be parsed on different ways or something?
I can also provide PR for fixing this if needed.
Hello. I am not the original author. One of the original authors: @smsisko, but as far as I understand, he does not develop the library on GitHub (only in sourceforge, but last version is very old).
I just made a fork and redo it for myself: https://github.com/ange007/HTMLp/tree/modern Description: https://github.com/ange007/HTMLp/issues/2
But I’m ready to accept edits, both in the original branch, and in my own if it will be interesting.
Hi, I haven't used that library in a long time. Back then I made a couple of changes with the original author for a problem a ran into, but that was about it. There wasn't a lot of activiy, so I was added as a maintainer. Nowadays I don't often have an occasion to use Delphi (or Free Pascal), and I really haven't kept up to date with the HTML standard. Sorry I can't be more helpful.