htmldate
htmldate copied to clipboard
Fast and robust date extraction from web pages, with Python or on the command-line
I have mostly tested `htmldate` on a set of English, German and French web pages I had run into by surfing or during web crawls. There are definitely further web...
By default dates before 1995 are considered implausible, however changing the minimum date does not fix the issue. CLI: `htmldate -u "https://web.archive.org/web/20201205182452/https://www.lesechos.fr/1991/01/saddam-hussein-menace-larabie-saoudite-939083" -vv -min "1990-01-01"` Python: Here is the debugging...
Is there a way to force htmldate to look for datetime and not date, or prioritise specific extractors over others, eg opengraph over url-extraction. Let me give you an example:...
Configuration arguments are available for Python functions, it would be nice to make them available as command-line arguments as well: - outputformat
In our testing the current code produces unreliable results when tested on Wikipedia articles. Sometimes it returns a data, sometimes it doesn't. Wikipedia articles are constantly updated, so @coreydockser and...
A short version of the documentation is available straight from Github ([README.rst](https://github.com/adbar/htmldate/blob/master/README.rst)) while a more exhaustive one is present in the `docs` folder and online on [htmldate.readthedocs.io](https://htmldate.readthedocs.io) Several problems could...
In order to help new contributors it would be nice to add [pre-commit](https://pre-commit.com/) hooks to the repository with the following checks: - black - flake8 - isort - ...? The...
Dear all, Htmldate is now widely used and it has become apparent that the GPL license is not prevalent in Python packages, its potential implications are also not easily understood....
So far only the logs provide info on this. It would be nicer to be able to pinpoint the type (header, element, or text) or even the exact location of...