Adrien Barbaresi
Adrien Barbaresi
Hi @kvasilopoulos, the example above now works with the additional parameter `deferred_url_extractor`: ``` >>> from htmldate import find_date >>> url = "https://www.ot.gr/2022/03/23/apopseis/daimonopoiisi/" >>> find_date(url, outputformat='%Y-%m-%d %H:%M:%S', verbose=True, deferred_url_extractor=True) '2022-03-23 06:15:58'...
For further reference, here is a new URL to follow the coverage: https://app.codecov.io/gh/adbar/trafilatura/blobs/master/trafilatura/json_metadata.py
@vprelovac I may work further on the `fetch_url()` function in the future, in the meantime I chose to document how to perform the operation manually, it's in the PR above.
@chakravir Trafilatura tries to work in a generic way and there is only little potential for customization.
@swetepete are you still working on this or should we consider merging/aborting?
Note: see the API as described in corresponding [doc page](https://trafilatura.readthedocs.io/en/latest/usage-api.html).
Hi @1over137, thanks for listing the differences, I like your approach but I'm not sure how to modify the software to improve performance. Judging from your results most differences come...
Thanks for the file, it's not particularly long and I'm not sure I could directly use it (what's the source?) but it shows the problem with the current approach. I've...
No worries, thanks for providing additional information. I get your point, although the EN Wiktionary is going to get better over time using the RU Wiktionary would make sense. As...
@1over137 Yes, `greedy` works because it can take into account affixes of up to 2 characters in an unsupervised way. I decided to make it the default for languages for...