html-parser topic
NSoup
NSoup is a .NET port of the jsoup (http://jsoup.org) HTML parser and sanitizer originally written in Java
jusText
Heuristic based boilerplate removal tool
html5parser
A super tiny and fast html5 AST parser.
floki
Floki is a simple HTML parser that enables search for nodes using CSS selectors.
save-for-offline
Android app for saving webpages for offline reading.
Modest
Modest is a fast HTML renderer implemented as a pure C99 library with no outside dependencies.
myhtml
Fast C/C++ HTML 5 Parser. Using threads.
HtmlMonkey
Lightweight HTML/XML parser written in C#.
AdvancedHTMLParser
Fast Indexed python HTML parser which builds a DOM node tree, providing common getElementsBy* functions for scraping, testing, modification, and formatting. Also XPath.
jodd
Jodd! Lightweight. Java. Zero dependencies. Use what you like.