missing timezone
The revelant HTML file : htmldate_debug_no_timezone.html.zip
Thanks for all your hard work! htmldate is very useful for me.
But when I use htmldate, I found ther is no timezone in the result.
I try the code :
from htmldate import find_date
from pathlib import Path
content = Path(input_path).read_text(encoding='utf-8')
from lxml import html
mytree = html.fromstring(content)
publish_time = find_date(mytree, outputformat="%Y-%m-%d %H:%M:%S%z")
HTML be like :
"datePublished": "2024-11-06T08:37:00+05:30",
Result I expected :
2024-11-06T08:37:00+05:30
the result from htmldate :
2024-11-06 00:00:00
There is no timezone, please help check this problem. I will be very glad to fix this problem with you.
Hi @TheCutestCat, when dates are found using HTML markup you get the time zone, when they are extracted from free text regexes are applied. The regular expressions don't include time zones for now. Feel free to have a look and draft a pull request, your case is here (and others below and above):
https://github.com/adbar/htmldate/blob/9c5f619db70fd6e32ceab6ebd63af60ff1f6b166/htmldate/extractors.py#L135