htmlparser
htmlparser copied to clipboard
Please add back `nu.validator.htmlparser.tools`
The original 1.4 distribution contained some example apps, that could be used from the command line. The author stated:
Sample Apps
The jar file contains sample
main()entry points:
nu.validator.htmlparser.tools.XSLT4HTML5nu.validator.htmlparser.tools.XSLT4HTML5XOMnu.validator.htmlparser.tools.HTML2XMLnu.validator.htmlparser.tools.XML2HTMLnu.validator.htmlparser.tools.XML2XMLnu.validator.htmlparser.tools.HTML2HTMLThe first two are sample apps that demo the use of XSLT with HTML5. The first one can use SAX or DOM and requires the Xalan serializer. The second one uses XOM. Running without parameters dumps usage help.
java -cp htmlparser-1.4.jar nu.validator.htmlparser.tools.XSLT4HTML5 --template=sort-ul.xsl --input-html=test.html --output-html=out.html --mode=domHTML2XML converts HTML5 to XML 1.0 plus Namespaces. With no arguments, it reads from stdio and writes to stdout. With one parameter, it reads the named file and writes to stdout. With two parameters, the first is the input file name and the second is the output file name.
XML2HTML, HTML2HTML and XML2XML work analogously. The *2HTML versions produce bad output if the document tree is not serializable as HTML5. It is up to the user the make sure that it is.
The sourcecode is in test-src/nu/validator/htmlparser/tools/ but none of the releases I found on Maven Central has the classes built in. I do have an older JAR, which is also named htmlparser-1.4.jar on disk, from years ago, that had these classes and thus is usable from the CLI.
May I kindly ask you, to bring these back, so one can convert HTML into XHTML simply from the command line? Thank you!
As far as I can tell, there is no currently-existing tool that does what HTML2XML does, and the obvious ways of writing one (eg. Python BeautifulSoup, HTML Tidy) donʼt actually work right especially around namespaces.
The version here also isnʼt ideal (Iʼm planning to submit another PR about that in a few minutes) but it would be better than everything else I could find, ie. nothing.