cpython
cpython copied to clipboard
Add option to output UTC datetimes as "Z" in `.isoformat()`
| BPO | 46614 |
|---|---|
| Nosy | @brettcannon, @abalkin, @merwok, @pganssle, @godlygeek |
| PRs |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
assignee = 'https://github.com/pganssle'
closed_at = None
created_at = <Date 2022-02-02.17:31:46.548>
labels = ['type-feature', 'library', '3.11']
title = 'Add option to output UTC datetimes as "Z" in `.isoformat()`'
updated_at = <Date 2022-04-04.01:30:42.788>
user = 'https://github.com/pganssle'
bugs.python.org fields:
activity = <Date 2022-04-04.01:30:42.788>
actor = 'godlygeek'
assignee = 'p-ganssle'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2022-02-02.17:31:46.548>
creator = 'p-ganssle'
dependencies = []
files = []
hgrepos = []
issue_num = 46614
keywords = ['patch']
message_count = 6.0
messages = ['412384', '412876', '413102', '416622', '416638', '416648']
nosy_count = 5.0
nosy_names = ['brett.cannon', 'belopolsky', 'eric.araujo', 'p-ganssle', 'godlygeek']
pr_nums = ['32041']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue46614'
versions = ['Python 3.11']
As part of bpo-35829, it was suggested that we add the ability to output the "Z" suffix in isoformat(), so that fromisoformat() can both be the exact functional inverse of isoformat() and parse datetimes with "Z" outputs. I think that that's not a particularly compelling motivation for this, but I also see plenty of examples of datetime.utcnow().isoformat() + "Z" out there, so it seems like this is a feature that we would want to have anyway, particularly if we want to deprecate and remove utcnow.
I've spun this off into its own issue so that we can discuss how to implement the feature. The two obvious questions I see are:
- What do we call the option?
use_utc_designator,allow_Z,utc_as_Z? - What do we consider as "UTC"? Is it anything with +00:00? Just
timezone.utc? Anything that seems like a fixed-offset zone with 0 offset?
For example, do we want this?
>>> LON = zoneinfo.ZoneInfo("Europe/London")
>>> datetime(2022, 3, 1, tzinfo=LON).isoformat(utc_as_z=True)
2022-03-01T00:00:00Z
>>> datetime(2022, 6, 1, tzinfo=LON).isoformat(utc_as_z=True)
2022-06-01T00:00:00+01:00
Another possible definition might be if the tzinfo is a fixed-offset zone with offset 0:
>>> datetime.timezone.utc.utcoffset(None)
timedelta(0)
>>> zoneinfo.ZoneInfo("UTC").utcoffset(None)
timedelta(0)
>>> dateutil.tz.UTC.utcoffset(None)
timedelta(0)
>>> pytz.UTC.utcoffset(None)
timedelta(0)
The only "odd man out" is dateutil.tz.tzfile objects representing fixed offsets, since all dateutil.tz.tzfile objects return None when utcoffset or dst are passed None. This can and will be changed in future versions.
I feel like "If the offset is 00:00, use Z" is the wrong rule to use conceptually, but considering that people will be opting into this behavior, it is more likely that they will be surprised by datetime(2022, 3, 1, tzinfo=ZoneInfo("Europe/London").isoformat(utc_as_z=True) returning 2022-03-01T00:00:00+00:00 than alternation between Z and +00:00.
Yet another option might be to add a completely separate function, utc_isoformat(*args, **kwargs), which is equivalent to (in the parlance of the other proposal) dt.astimezone(timezone.utc).isoformat(*args, **kwargs, utc_as_z=True). Basically, convert any datetime to UTC and append a Z to it. The biggest footgun there would be people using it on naïve datetimes and not realizing that it would interpret them as system local times.
Would it be horrible to have the timezone instance control this?
I feel like "If the offset is 00:00, use Z" is the wrong rule to use conceptually
This is a really good point that I hadn't considered: +00:00 and Z are semantically different, and just because a datetime has a UTC offset of 0 doesn't mean it should get a Z; Z is reserved specifically for UTC.
It seems like the most semantically correct thing would be to only use Z if tzname() returns exactly "UTC". That would do the right thing for your London example for every major timezone library I'm aware of:
>>> datetime.datetime.now(zoneinfo.ZoneInfo("Europe/London")).tzname()
'GMT'
>>> datetime.datetime.now(zoneinfo.ZoneInfo("UTC")).tzname()
'UTC'
>>> datetime.datetime.now(datetime.timezone.utc).tzname()
'UTC'
>>> datetime.datetime.now(dateutil.tz.gettz("Europe/London")).tzname()
'GMT'
>>> datetime.datetime.now(dateutil.tz.UTC).tzname()
'UTC'
>>> datetime.datetime.now(pytz.timezone("Europe/London")).tzname()
'GMT'
>>> datetime.datetime.now(pytz.UTC).tzname()
'UTC'
I think the right rule to use conceptually is "if use_utc_designator is true and the timezone name is 'UTC' then use Z". We could also check the offset, but I'm not convinced we need to.
I think this approach is probably the best we can do, but I could also imagine that users might find it to be confusing behavior. I wonder if there's any informal user testing we can do?
I guess the ISO 8601 spec does call "Z" the "UTC designator", so use_utc_designator seems like approximately the right name. My main hesitation with this name is that I suspect users may think that use_utc_designator means that they unconditionally want to use Z — without reading the documentation (which we can assume 99% of users won't do) — you might assume that dt.isoformat(use_utc_designator=True) would translate to dt.astimezone(timezone.utc).replace(tzinfo=None).isoformat() + "Z".
A name like utc_as_z is definitely less... elegant, but conveys the concept a bit more clearly. Would be worth throwing it to a poll or something before merging.
Bad idea: pass zulu=True
It is short, memorable if you know about it, otherwise obscure enough to push people to read the docs and be clear about what it does.
Also strange and far from obvious, so a bad idea. Unless… ?
My main hesitation with this name is that I suspect users may think that
use_utc_designatormeans that they unconditionally want to useZ— without reading the documentation (which we can assume 99% of users won't do)
I was thinking along similar lines when I used use_utc_designator in the PR, but I drew a different conclusion. I was thinking that the name use_utc_designator is sufficiently abstruse that no one would even be able to guess that it's referring to "Z" without actually reading the documentation for the parameter. In particular, I worry that zulu=True or allow_Z=True might lead people to make the mistake of thinking that they'll always get "Z" instead of "+00:00".
A name like
utc_as_zis definitely less... elegant, but conveys the concept a bit more clearly.
This would definitely be more memorable and more approachable. If we stick with making it conditional on tzname() == "UTC", I definitely think we want to have "utc" in the name of the parameter, and utc_as_z satisfies that.
utc_as_z seems reasonable to me. Let me know if you'd like me to update the PR.
After running a survey where I tried out 4 different keyword arguments at random with 972 participants (not all of them completed the survey all the way, admittedly) and asked people what they thought it does in various situations, I got the following results:
Percentage getting semantics question right by kwarg:
allow_z : naïve: 50.495 | utc: 89.47 | nyc: 80.90 | lon: 42.35
format_utc_as_z : naïve: 41.304 | utc: 88.68 | nyc: 54.88 | lon: 40.00
use_utc_designator : naïve: 22.857 | utc: 73.68 | nyc: 47.87 | lon: 39.29
utc_as_z : naïve: 38.542 | utc: 93.27 | nyc: 57.30 | lon: 42.86
This is strange because I think none of us like allow_z, including the survey participants. When told the actual semantics of the keyword argument, they overwhelmingly prefer utc_as_z:
utc_as_z 165
format_utc_as_z 78
I don't like any of these 42
allow_z 19
use_utc_designator 12
I was hoping the results here would be less... ambiguous.
Given these results and the fact that people don't seem to have a great grasp on what this is supposed to do, let's push this feature to 3.12 to try to come up with a better design.
I have to disagree about this approach:
>>> LON = zoneinfo.ZoneInfo("Europe/London")
>>> datetime(2022, 3, 1, tzinfo=LON).isoformat(utc_as_z=True)
2022-03-01T00:00:00Z
>>> datetime(2022, 6, 1, tzinfo=LON).isoformat(utc_as_z=True)
2022-06-01T00:00:00+01:00
Mainly because you may expect that your local TZ always return the 2nd format instead of the 1st one (the shortcut of +00:00) I think this should be only done when the timezone is not local but UTC only (or when missing timezone?)
At the end what we want is to avoid this => https://i.redd.it/ocpk67fp6tx81.jpg
After seeing usage in the PR, I find use_utc_designator=True a bit unwieldy, especially as it will nearly always be a keyword param.
I suggest this is controlled in part by the tzinfo or time instance.
For instance, in datetime.py this code in class time
class time:
# [...]
def _tzstr(self):
"""Return formatted timezone offset (+xx:xx) or an empty string."""
off = self.utcoffset()
return _format_offset(off)
can be extended as something like
class time:
# [...]
def _tzstr(self):
"""Return formatted timezone offset (+xx:xx) or an empty string."""
off = self.utcoffset()
if not off and self.tzname() == "UTC":
return "Z"
return _format_offset(off)
A respective change would be needed in class datetime.
The string UTC is produced by
class timezone(tzinfo):
# ...
@staticmethod
def _name_from_offset(delta):
if not delta:
return 'UTC'
# ...
Now changing the default behavior of datetime formatting function to start outputting Z instead of 00:00 for UTC timestamps might cause some backward compatibility problems, so that utc_as_z=True argument might be needed.
But I think the proposal utc_as_z=True is not too elegant nor useful.
So looking at the declaration of isoformat
class datetime:
#....
def isoformat(self, sep='T', timespec='auto'):
#....
class time:
#....
def isoformat(self, timespec='auto'):
# ....
Perhaps we could use a new tzspec argument to match the name of the timespec, with values 'auto', 'utcz', 'hours', 'minutes', 'seconds', etc. 'auto' would be the current behavior. 'utcz' would be the new behavior or using 'Z' instead of '00:00', and more values could be reserved to be fixed later ('hours', 'minutes', 'seconds') to control the behavior of the _format_offset() function.
It should just automatically do the right thing, i.e. use Z when the timezone is UTC. This is what happens in eg. JavaScript. There's no need for a new option.
It should just automatically do the right thing, i.e. use
Zwhen the timezone is UTC. This is what happens in eg. JavaScript. There's no need for a new option.
In principle, I support this. In practice, I suspect this won't be done because changing the default behaviour of isoformat would break code that depends on the UTC offset always being formatted the way it currently is.
I object to the idea that using Z unconditionally is "the right thing", but also there are several logistical problems that make the idea of changing the default behavior a non-starter:
- Backwards compatibility — this alone will scuttle the proposal, because there are absolutely people relying on the fact that this outputs
+00:00instead ofZ - What it means to be "UTC" is ill-defined, as mentioned in the first post. Non-UTC datetimes might incidentaly have a +00:00 offset, and it isn't clear that it is appropriate to automatically give them
Z, in which case we need to come up with a reliable heuristic for what it means to be "UTC". Making this behavior auto-magical will make it even harder for people to discover the edge cases.
Whatever we do needs to be explicit, but it seems very hard to do that, since when we ran the survey people seemed to have conflicting understandings of what any of this stuff might do; there doesn't seem to be an unambiguous way to convey what the behavior might be.
If this were a high priority and in high demand, I would suggest we might bite the bullet and have this be just one more thing that end users need to learn about datetime, but it seems like the main motivation for adding this option was to make it so that formats ending with Z would technically satisfy the pre-3.11 contract of fromisoformat (parsing any format that .isoformat can emit), which doesn't even apply anymore.
It might be worth adding a isoformatutc method in that case. It would hit two birds with one stone - convert the time zone to UTC and provide it with the Z suffix - which is useful for machine processing. This is the same as JS's toISOString.