htmldate icon indicating copy to clipboard operation
htmldate copied to clipboard

Fast and robust date extraction from web pages, with Python or on the command-line

Results 19 htmldate issues
Sort by recently updated
recently updated
newest added

So far `find_date()` returns a string containing the result. To add context another format is required, JSON is a good candidate. An optional parameter like `as_json=True` could allow for the...

enhancement

1. Add a `requirements-dev.txt` file with the following dependencies - black - mypy - types-dateparser types-python-dateutil types-urllib3 - pytest pytest-cov 2. Update CI workflow (`.github/workflows/tests.yml`) accordingly, i.e. remove the dev...

up for grabs
maintenance

It yields > "errorMessage": "cannot import name 'etree' from 'lxml' (/var/task/lxml/__init__.py)", Trying to figure out what is the issue/how to fix it.

- check the results fot this site https://www.sofi.com/online-privacy-policy/#global-privacy-control expected: 2024-07-01 - https://www.tesla.com/legal/privacy expected: 2023-05-01 @adbar

enhancement

The script does not find the date (Russian): from htmldate import find_date url = "https://kamaz.ru/press/releases/kamaz_i_skolkovo_sozdadut_ekologicheski_chistyy_gruzovik/" print(find_date(url, extensive_search=True)) # Returns None print(find_date(url, extensive_search=False)) # Returns None Xpath selector of dates on...

bug

The setup processed can be modernized by transferring as many lines from `setup.py` to a new `pyproject.toml` file as possible.

up for grabs
maintenance

Hi @adbar, thanks for this awesome library. While porting this library to Go, I noticed there are two Mediacloud tests that might be wrong: ```json "https://www.baltimoresun.com/opinion/columnists/zurawik/bs-ed-zontv-media-year-20201223-cnvrlhkhnrbihcxx6wxcxt2b7y-story.html#ed=rss_www.baltimoresun.com/arcio/rss/category/latest/": { "file": "1805697156.html", "date":...

question

I noticed that `htmldate` utilizes the `find_date` function, which internally relies on `examine_header`. Does it make sense to parse the response header from the server? Do servers typically default this...

question

The revelant HTML file : [htmldate_debug_no_timezone.html.zip](https://github.com/user-attachments/files/17943812/htmldate_debug_no_timezone.html.zip) Thanks for all your hard work! `htmldate` is very useful for me. But when I use `htmldate`, I found ther is no timezone in...

enhancement