htmlparser2 icon indicating copy to clipboard operation
htmlparser2 copied to clipboard

Parser incorrectly recognizes (less than) as a starting tag

Open prajal-alation opened this issue 1 year ago • 3 comments

Parser doesn't check if value after starting tag is a valid HTML tag or not. Parser should check if it's a valid HTML tag only then remove everything after starting tag if no closing tag found.

Taking example from : https://github.com/apostrophecms/sanitize-html/issues/339

If you can find this for <$40, it's a steal! I would highly recommend getting it

after this text is run through sanitize-html which uses htmlparser2, the string is truncated to the text before the 'lt' symbol, so the remainder of the text is discarded. Is there a setting I am missing or is this a bug?

Input: If you can find this for <$40, it's a steal! I would highly recommend getting it

Result: If you can find this for

Expected: If you can find this for <$40, it's a steal! I would highly recommend getting it

prajal-alation avatar Aug 31 '23 14:08 prajal-alation

You must either be using an old version of htmlparser2, or have xmlMode enabled. Current versions of the module will skip over <$.

fb55 avatar Sep 01 '23 07:09 fb55

@fb55 That one works in the latest.. but have another use case where for internal DB functions considering time dimension, parser is incorrectly recognizing the tag. ( or any word after < )

Example : event_time<current_time()

This gets trimmed down to : event_time

Any idea on what can be a workaround for that ?

prajal-alation avatar Sep 01 '23 19:09 prajal-alation

@fb55 any update on the above ?

prajal-alation avatar Oct 31 '23 17:10 prajal-alation