Adrien Barbaresi
Adrien Barbaresi
@mertdeveci5 There are no PRs at the moment as it's not my main focus and nobody else seems to be contributing on this. Do you need both formatting and links?...
This is another issue then, not a problem between extraction options but (probably) a case where the extractor misses the relevant section of the page. edit: see #518
I prefer working towards releasing a version 1 and see from there, that includes documentating how the sources are compiled, I'm working on it. The API is not completely stable...
@1over137 You can start working on a PR if you want, the API for dictionary lookup strategy is stable. I also added info in the training readme on additional dictionaries.
@1over137 Did that solve your problem or do we need to work on the documentation?
Hi @zirkelc, thanks for your work, here are a few comments: - Now I get what the `isProbablyReaderable` function in the original Javascript code is about! Thanks for pointing it...
@zirkelc I'm not sure what to do with this pull request, do you want to keep working on it by leveraging the functionality you just introduced?
I also think this reliability issue would prevent us from directly using such a metric. It's nice to have ported `is_probably_readarable()` though and we can come back to it in...
I assume this is related to a relatively rare combination, a homepage with no main text and also no paragraphs (text in div elements). It could be an occasion to...
Hi, I've never tried using Trafilatura on Telegram posts, I need to check what's going on.