Adrien Barbaresi
Adrien Barbaresi
@naftalibeder It's doesn't appear to be going forward. Did you try building [LXML from source](https://lxml.de/build.html)?
Hi @carschno, I can reproduce the bug. Extraction with images isn't my priority but I'll try to look into it.
No it isn't expected but it looks quite convoluted. The backup algorithm (internal fork of readability-lxml but identical here) triggers the error: - No images, backup algorithm used, everything is...
I could be wrong but I don't see any line in the code which could be affected by that. The vertical bars are between quotation marks so they are part...
Hi @phongtnit, thanks for the suggestion. It looks like an interesting additional functionality. Would you be interested in drafting a corresponding pull request?
@pieterhartel There was a small issue here which I fixed, the rest can be explained by the orphan text at the bottom. If you write `The quick brown fox jumps...
I get your point, but the last title in your example is followed by orphan text without a tag, so the last tag seen by the parser is ``.
Hi @Seirdy, it seems like an interesting idea but I don't quite see what is currently lacking in the software. Could you please provide a concrete example of what you...
Thanks for the info, I get your point. I don't know how rare it is but I assume it is uncommon for web pages to convey information in the HTML...
Hi @fraseInc, I tend indeed to discard iframes by design as embedded content is usually not as relevant text-wise. Do you have examples of elements which should be included?