floki icon indicating copy to clipboard operation
floki copied to clipboard

Empty tag attributes are not parsed correctly

Open 1player opened this issue 1 year ago • 1 comments

iex(4)> Floki.parse_document("<a href></a>")
{:ok, [{"a", [{"href", "href"}], []}]}

Floki interprets this example as if it was <a href="href"> which is of course wrong. I would expect either Floki to represent the empty attribute as an empty string, or to omit it altogether.

1player avatar Apr 04 '24 09:04 1player

Does not seem to affect fast_html

1player avatar Apr 04 '24 09:04 1player

This is a limitation of the default parser, mochiweb_html. Please try to use FastHTML or HTML5ever as the README suggest.

philss avatar Jun 06 '24 16:06 philss

That's worth documenting, rather than closing this bug as "completed", no? What's the point of shipping with a broken parser?

On Thu, 6 Jun 2024, at 17:28, Philip Sampaio wrote:

This is a limitation of the default parser, mochiweb_html. Please try to use FastHTML or HTML5ever as the README suggest.

— Reply to this email directly, view it on GitHub https://github.com/philss/floki/issues/558#issuecomment-2152940878, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFIPSBYNWVFVZ5CTZZDAHDZGCE3ZAVCNFSM6AAAAABI5CN4SCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJSHE2DAOBXHA. You are receiving this because you authored the thread.Message ID: @.***>

1player avatar Jun 07 '24 08:06 1player

@1player sorry, I didn't want to sound rude. My point was to point out that this is documented in our README: https://github.com/philss/floki?tab=readme-ov-file#alternative-html-parsers

philss avatar Jun 10 '24 23:06 philss