Adrien Barbaresi

Results 412 comments of Adrien Barbaresi

@felipehertzer I added a small script and updated the guidelines in [tests/README.rst](https://github.com/adbar/trafilatura/blob/master/tests/README.rst). If you want to have a look, it's how I test such cases.

No worries! I'll work on the documentation but here is how I'd answer your questions: - Yes you could add more websites if there are not already in the benchmark...

@felipehertzer You'll see that I annotated metadata for some of the documents in the benchmark. Since you're interested in certain metadata, you could maybe add a small evaluation for it...

I find a slightly better recall for `@data-testid` alone instead of `@data-testid="AuthorCard"`. The rest are either not visible or have negative effects. Maybe the XPaths are not always the best...

@felipehertzer Are you still working on the PR?

Thanks! There is something wrong with the tests but it's unrelated. I need to check it before merging.

I ran a few tests, the overall result is slightly worse on my data but it's still an acceptable change. The lightbox rules clearly harms accuracy so I removed it....

I'll add new evaluation data shortly and then test again to make sure nothing breaks with this PR. You can continue working on it if you find something else. I...

Hi, thanks for the detailed example, as you say this seems to be a bug (item 2), a potential enhancement (button), and a problem with the source at the same...

Good point, it's not a bug in itself, the feature is not implemented yet. Let's put that on the list.