nokogiri icon indicating copy to clipboard operation
nokogiri copied to clipboard

[feature request] HTML5 parser for JRuby implementation

Open flavorjones opened this issue 4 years ago • 1 comments

This issue is a placeholder for collaboration with the JRuby community to find a way to provide HTML5-compliant parsing for Nokogiri's JRuby implementation.

#2204 provides an HTML5 parser for the CRuby implementation by leveraging the Gumbo parser, implemented in C, and a C extension that is tightly coupled to libxml2. As a result, the Nokogiri::HTML5 module will not be immediately available on JRuby, which uses Xerces in place of libxml2.

The Nokogiri maintainers feel it is important to think about and we hope to work on this in the future. If you're interested in helping with HTML5 support on JRuby, please comment on this issue or ping the maintainers on the mailing list or the Discord channel.

flavorjones avatar Apr 29 '21 07:04 flavorjones

Possible starting point: https://about.validator.nu/htmlparser/

rubys avatar Apr 29 '21 12:04 rubys