Readability4J
Readability4J copied to clipboard
A Kotlin port of Mozilla‘s Readability. It extracts a website‘s relevant content and removes all clutter from it.
Hello, mozilla's readbility filters out `` tags before processing the html further, as can be seen in https://github.com/mozilla/readability/blob/master/Readability.js#L633. Readbility4J however does not do this https://github.com/dankito/Readability4J/blob/master/src/main/kotlin/net/dankito/readability4j/processor/ArticleGrabber.kt#L753 I understood, that this library...
when use getContentWithUtf8Encoding get html value, but get error data. ``` ``` should is ``` ``` version: ``` net.dankito.readability4j readability4j 1.0.4 ```
Port gets removed in URI when running method that resolves resolute URIs to absolute. This results in a broken link for all URIs that does not use default ports (80...
Some of the dependency versions needs to be bumped major versions to avoid vulnerabilities. Looking at a few on maven repository: - Jsoup 1.11.2: 2 direct vulnerabilities and multiple indirect...
Hello, First, I would like to express my appreciation to @dankito and everyone else involved for developing such a useful library as Readability4J. I encountered an issue while parsing content...