Adrien Barbaresi

Results 415 comments of Adrien Barbaresi

Useful test case: https://github.com/miso-belica/jusText/issues/42

Issue addressed in b6808306670d3ec30cd2ac6591decac10f705fce (test data) and 7dd083a (additional XPath expressions). A few problems have been solved and there shouldn't be significant performance issues with the Polish benchmark anymore. The...

Hi @adri1wald, that's a bug indeed, thanks for filing the issue.

Hi @adri1wald, how do you currently fix this? We could use this as a quick hack. I'm not sure where the bug happens, it doesn't affect the Markdown output so...

It's now fixed, at least as paragraphs are concerned (cc @adri1wald).

For interaction with websites also see [pypetteer](https://github.com/pyppeteer/pyppeteer), a headless chrome/chromium automation library (unofficial port of puppeteer).

This is a possible way to use cookies: https://github.com/adbar/trafilatura/pull/108#issuecomment-908260901

See also #94: It would be useful to check all cases in `core.py` where element children are being iterated on to avoid (rare) endless loops (see discussion in #91). `element.iter()`...

Hi @phongtnit, this is indeed a bug. In future versions no `graphic` element will be displayed if the `src` attribute is missing. The webpage you mentioned features two series of...

``-tags are typically a candidate in order to preserve formatting info, for example ``.