HTMLp HTMLReader handles numeric and named entity references different ways

HTMLReader handles numeric and named entity references different ways

Open Quutti opened this issue 5 years ago • 2 comments

https://github.com/ange007/HTMLp/blob/4883f9902d88aee29570867530d804649ee15c79/HtmlReader.pas#L497

ReadNumericEntityNode function handles the readings different way than ReadNamedEntityNode. Numeric entity is read as TEXT_NODE and named entities are read as ENTITY_REFERENCE_NODE, also different events are triggered which causes HTMLParser to handle them in separate ways, which may cause problems when parsing HTML. I.e. /&lt/; and /&/#60/; are handled in separate ways.

You guys know if this is intended functionality or not? Does HTML parsing spec state that these has to be parsed on different ways or something?

I can also provide PR for fixing this if needed.

Oct 28 '19 14:10 Quutti

Hello. I am not the original author. One of the original authors: @smsisko, but as far as I understand, he does not develop the library on GitHub (only in sourceforge, but last version is very old).

I just made a fork and redo it for myself: https://github.com/ange007/HTMLp/tree/modern Description: https://github.com/ange007/HTMLp/issues/2

But I’m ready to accept edits, both in the original branch, and in my own if it will be interesting.

Oct 30 '19 20:10 ange007

Hi, I haven't used that library in a long time. Back then I made a couple of changes with the original author for a problem a ran into, but that was about it. There wasn't a lot of activiy, so I was added as a maintainer. Nowadays I don't often have an occasion to use Delphi (or Free Pascal), and I really haven't kept up to date with the HTML standard. Sorry I can't be more helpful.

Oct 31 '19 00:10 smsisko

HTMLp HTMLp copied to clipboard

HTMLReader handles numeric and named entity references different ways

HTMLp
HTMLp copied to clipboard