html5ever
html5ever copied to clipboard
Stripping of bad input
This is a feature request for the ability to “correct” bad input, by stripping away parts that are not conforming to the spec. This is mostly useful for sanitizers, especially for pages that want to ensure that their contents are valid HTML.
This is a feature request for html5ever because of the intimate knowledge of the spec required.
Interesting proposal, initially I thought that just serializing the parsed output won't work.
The issue, is valid HTML is different than parsed HTML, e.g. you cant have two element with different id, etc.