Adrien Barbaresi

Results 412 comments of Adrien Barbaresi

Hi @rahulbot, thanks for your feedback, I'll need to check the webpages and the current approach to see if I can find a way to exclude related links. It can...

Status update: - Cases 1 and 4 work out of the box (in the standard setting) - The links at the bottom of case 2 remain (for now) - Case...

Thanks for the answer, I added a bypass to PR #138

Shameless plug: [trafilatura](https://github.com/adbar/trafilatura) builds upon `readability-lxml` and can convert the output to TXT, XML, CSV and JSON.

Same thing as in #150, you could try [trafilatura](https://github.com/adbar/trafilatura) which builds upon `readability-lxml`. I just tried and was able to extract the text you mentioned: - `pip/pip3 install trafilatura` -...

Hi @zirkelc, thanks for the PR, the code looks good. Yes, please add basic tests for the functions you added. At best in a new function in `tests/unit_tests.py` or in...

A test is failing because you added a file to the resource directory, could you please fix this? `assert 10

It works, thanks. You should now add tests for the lines not covered by the HTML file (see coverage report in "Files changed" tab).

There is something wrong with the syntax, see flake error message.

I assume the problem is related to nested quotation marks, could you check again?