dateparser
dateparser copied to clipboard
Timezones support is deficient
I think we should rethink how the timezones are applied, as we have more than one case of abbreviations with different meanings.
I could found at least these cases:
-
ACT:
- Acre Time: UTC−05
- ASEAN Common Time (unofficial): UTC+06:30 – UTC+09
-
AMT:
- Amazon Time (Brazil): UTC−04
- Armenia Time: UTC+04
-
AST:
- Arabia Standard Time: UTC+03
- Atlantic Standard Time: UTC−04
-
BST:
- Bangladesh Standard Time: UTC+06
- British Summer Time (British Standard Time from Feb 1968 to Oct 1971): UTC+01
- Bougainville Standard Time: UTC+11
- issues / PRs: https://github.com/scrapinghub/dateparser/pull/682
-
CDT:
- Central Daylight Time (North America): UTC−05
- Cuba Daylight Time: UTC−04
-
CST:
- Central Standard Time (North America): UTC−06
- Cuba Standard Time: UTC−05
- China Standard Time: UTC+08
-
ECT:
- Ecuador Time: UTC−05
- Eastern Caribbean Time: UTC−04
-
GST:
- Gulf Standard Time: UTC+04
- South Georgia and the South Sandwich Islands Time: UTC−02
-
IST:
- Irish Standard Time: UTC+01
- Israel Standard Time: UTC+02
- Indian Standard Time: UTC+05:30
- issues / PRs: https://github.com/scrapinghub/dateparser/issues/580 https://github.com/scrapinghub/dateparser/issues/609 https://github.com/scrapinghub/dateparser/issues/636 https://github.com/scrapinghub/dateparser/pull/637
I'm not sure about the solution we could implement, but it's obvious that we can not choose a timezone randomly.
Some ideas:
- We could choose the most used, for example, we could maybe change IST from "Israel Standard Time" (current) to "Indian Standard Time" (hard to decide, not a solution itself).
- We could add a warning when parsing timezones with more than one meaning (self-explanatory, but not a solution and it could be a problem when people already know it, as it could spam with a lot of warning messages).
- We could add a setting like "PREFER_TIMEZONE_FROM" or something similar where users can specify if they prefer timezones from a continent or another negative vs positive timezones (I'm not sure about this, as there are some examples in the list (like CDT or ECT) where all the timezones are from the same continent/region).
- We could add a setting that would allow specifying preferent manual timezones (example:
TIMEZONES_EQUIVALENT = {'IST': '+5.30'}). (possible solution but a hard to discover behavior). - We could implement some of the above ideas together.
On the other side, we should think about how we could implement more timezones formats like 'Australia/Sydney': https://github.com/scrapinghub/dateparser/issues/298
Please, if you have any idea add a comment here below :point_down: Any idea will be welcomed!!!
Thanks for opening this discussion.
You could interpret it based on the language. For the CST example you could interpret based on whether 'en', 'es' or 'zh' has been set? Or perhaps specifically on the locale settings. E.g. CST is Cuba only if 'es-cu' and BST is British only if 'en-gb' has been set?
That doesn't solve anything if e.g. the locale is set to 'fr-fr', i.e. what should it default to? But that is also the status quo. I like your suggestion to be able to override the default behaviour with a setting: {'IST': '+5.30'}
I hadn't thought about considering the locale to give a better timezone, and I think it could be a really good idea!
I have to think more about this and how this would affect the current implementation, but it sounds good. Of course, in most cases, this wouldn't help, but in others, it could be really useful. :muscle:
@Gallaecio any thought on this? :slightly_smiling_face:
I like the ideas. What about combining some of them like this?
- Switch to new default values based on popularity. Measuring popularity can be hard, though. If in doubt, we can keep whatever the current default is.
- Log a warning when choosing a global default for an ambiguous timezone. The warning can be silenced by providing disambiguation parameters.
- Set locale-specific default values for those locales where the choice is unambiguous. These override the global default value when the locale is in the user-defined list of input locales. If the locale list leaves room for ambiguity (e.g.
['en-ie', 'en-in']), use that of the first locale from the list but log a warning. - Allow users to define a custom mapping, which would override the locale-specific or global default.
Hello, Thanks for your work on dateparser! Any update on this? I just (painfully) discovered IST was set to israeli time rather than indian time. I like the idea to be able to provide a mapper to the settings. This is by the way possible with dateutil.parser.parse
Until a solution is found and implemented though, I think IST should refer to indian time rather than israeli one based on population (which is a proxy for popularity), don't you think? (I am neither indian nor israeli, this is not chauvinism)