Readability4J icon indicating copy to clipboard operation
Readability4J copied to clipboard

Images behind "button" tags stripped out

Open PhilC813 opened this issue 1 year ago • 0 comments

https://github.com/dankito/Readability4J/blob/170d052e99db58ecac85a77cdaa63ac8253be1fd/src/main/kotlin/net/dankito/readability4j/processor/ArticleGrabber.kt#L791

This line of code seems to be the one responsible for removing the side-by-side images of this Android Authority article, which are core to the content: https://www.androidauthority.com/zerocam-ai-3498885/

The HTML tag structure behind the images is the following: button > picture > source > img

Could there be some exception where the tags are not removed when they contain an image? Although the logic would probably have to be a bit more sophisticated to prevent button icons from making it into articles.

PhilC813 avatar Nov 13 '24 23:11 PhilC813