cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Add option to output UTC datetimes as "Z" in `.isoformat()`

Open pganssle opened this issue 3 years ago • 15 comments

BPO 46614
Nosy @brettcannon, @abalkin, @merwok, @pganssle, @godlygeek
PRs
  • python/cpython#32041
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/pganssle'
    closed_at = None
    created_at = <Date 2022-02-02.17:31:46.548>
    labels = ['type-feature', 'library', '3.11']
    title = 'Add option to output UTC datetimes as "Z" in `.isoformat()`'
    updated_at = <Date 2022-04-04.01:30:42.788>
    user = 'https://github.com/pganssle'
    

    bugs.python.org fields:

    activity = <Date 2022-04-04.01:30:42.788>
    actor = 'godlygeek'
    assignee = 'p-ganssle'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2022-02-02.17:31:46.548>
    creator = 'p-ganssle'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 46614
    keywords = ['patch']
    message_count = 6.0
    messages = ['412384', '412876', '413102', '416622', '416638', '416648']
    nosy_count = 5.0
    nosy_names = ['brett.cannon', 'belopolsky', 'eric.araujo', 'p-ganssle', 'godlygeek']
    pr_nums = ['32041']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue46614'
    versions = ['Python 3.11']
    

    pganssle avatar Feb 02 '22 17:02 pganssle

    As part of bpo-35829, it was suggested that we add the ability to output the "Z" suffix in isoformat(), so that fromisoformat() can both be the exact functional inverse of isoformat() and parse datetimes with "Z" outputs. I think that that's not a particularly compelling motivation for this, but I also see plenty of examples of datetime.utcnow().isoformat() + "Z" out there, so it seems like this is a feature that we would want to have anyway, particularly if we want to deprecate and remove utcnow.

    I've spun this off into its own issue so that we can discuss how to implement the feature. The two obvious questions I see are:

    1. What do we call the option? use_utc_designator, allow_Z, utc_as_Z?
    2. What do we consider as "UTC"? Is it anything with +00:00? Just timezone.utc? Anything that seems like a fixed-offset zone with 0 offset?

    For example, do we want this?

    >>> LON = zoneinfo.ZoneInfo("Europe/London")
    >>> datetime(2022, 3, 1, tzinfo=LON).isoformat(utc_as_z=True)
    2022-03-01T00:00:00Z
    >>> datetime(2022, 6, 1, tzinfo=LON).isoformat(utc_as_z=True)
    2022-06-01T00:00:00+01:00
    

    Another possible definition might be if the tzinfo is a fixed-offset zone with offset 0:

    >>> datetime.timezone.utc.utcoffset(None)
    timedelta(0)
    >>> zoneinfo.ZoneInfo("UTC").utcoffset(None)
    timedelta(0)
    >>> dateutil.tz.UTC.utcoffset(None)
    timedelta(0)
    >>> pytz.UTC.utcoffset(None)
    timedelta(0)
    

    The only "odd man out" is dateutil.tz.tzfile objects representing fixed offsets, since all dateutil.tz.tzfile objects return None when utcoffset or dst are passed None. This can and will be changed in future versions.

    I feel like "If the offset is 00:00, use Z" is the wrong rule to use conceptually, but considering that people will be opting into this behavior, it is more likely that they will be surprised by datetime(2022, 3, 1, tzinfo=ZoneInfo("Europe/London").isoformat(utc_as_z=True) returning 2022-03-01T00:00:00+00:00 than alternation between Z and +00:00.

    Yet another option might be to add a completely separate function, utc_isoformat(*args, **kwargs), which is equivalent to (in the parlance of the other proposal) dt.astimezone(timezone.utc).isoformat(*args, **kwargs, utc_as_z=True). Basically, convert any datetime to UTC and append a Z to it. The biggest footgun there would be people using it on naïve datetimes and not realizing that it would interpret them as system local times.

    pganssle avatar Feb 02 '22 17:02 pganssle

    Would it be horrible to have the timezone instance control this?

    merwok avatar Feb 08 '22 22:02 merwok

    I feel like "If the offset is 00:00, use Z" is the wrong rule to use conceptually

    This is a really good point that I hadn't considered: +00:00 and Z are semantically different, and just because a datetime has a UTC offset of 0 doesn't mean it should get a Z; Z is reserved specifically for UTC.

    It seems like the most semantically correct thing would be to only use Z if tzname() returns exactly "UTC". That would do the right thing for your London example for every major timezone library I'm aware of:

    >>> datetime.datetime.now(zoneinfo.ZoneInfo("Europe/London")).tzname()
    'GMT'
    >>> datetime.datetime.now(zoneinfo.ZoneInfo("UTC")).tzname()
    'UTC'
    >>> datetime.datetime.now(datetime.timezone.utc).tzname()
    'UTC'
    
    >>> datetime.datetime.now(dateutil.tz.gettz("Europe/London")).tzname()
    'GMT'
    >>> datetime.datetime.now(dateutil.tz.UTC).tzname()
    'UTC'
    
    >>> datetime.datetime.now(pytz.timezone("Europe/London")).tzname()
    'GMT'
    >>> datetime.datetime.now(pytz.UTC).tzname()
    'UTC'
    

    I think the right rule to use conceptually is "if use_utc_designator is true and the timezone name is 'UTC' then use Z". We could also check the offset, but I'm not convinced we need to.

    I think this approach is probably the best we can do, but I could also imagine that users might find it to be confusing behavior. I wonder if there's any informal user testing we can do?

    I guess the ISO 8601 spec does call "Z" the "UTC designator", so use_utc_designator seems like approximately the right name. My main hesitation with this name is that I suspect users may think that use_utc_designator means that they unconditionally want to use Z — without reading the documentation (which we can assume 99% of users won't do) — you might assume that dt.isoformat(use_utc_designator=True) would translate to dt.astimezone(timezone.utc).replace(tzinfo=None).isoformat() + "Z".

    A name like utc_as_z is definitely less... elegant, but conveys the concept a bit more clearly. Would be worth throwing it to a poll or something before merging.

    pganssle avatar Apr 03 '22 14:04 pganssle

    Bad idea: pass zulu=True

    It is short, memorable if you know about it, otherwise obscure enough to push people to read the docs and be clear about what it does.

    Also strange and far from obvious, so a bad idea. Unless… ?

    merwok avatar Apr 03 '22 18:04 merwok

    My main hesitation with this name is that I suspect users may think that use_utc_designator means that they unconditionally want to use Z — without reading the documentation (which we can assume 99% of users won't do)

    I was thinking along similar lines when I used use_utc_designator in the PR, but I drew a different conclusion. I was thinking that the name use_utc_designator is sufficiently abstruse that no one would even be able to guess that it's referring to "Z" without actually reading the documentation for the parameter. In particular, I worry that zulu=True or allow_Z=True might lead people to make the mistake of thinking that they'll always get "Z" instead of "+00:00".

    A name like utc_as_z is definitely less... elegant, but conveys the concept a bit more clearly.

    This would definitely be more memorable and more approachable. If we stick with making it conditional on tzname() == "UTC", I definitely think we want to have "utc" in the name of the parameter, and utc_as_z satisfies that.

    utc_as_z seems reasonable to me. Let me know if you'd like me to update the PR.

    After running a survey where I tried out 4 different keyword arguments at random with 972 participants (not all of them completed the survey all the way, admittedly) and asked people what they thought it does in various situations, I got the following results:

                      Percentage getting semantics question right by kwarg:
          allow_z       : naïve: 50.495 |  utc: 89.47  | nyc: 80.90 | lon: 42.35
      format_utc_as_z   : naïve: 41.304 |  utc: 88.68  | nyc: 54.88 | lon: 40.00
     use_utc_designator : naïve: 22.857 |  utc: 73.68  | nyc: 47.87 | lon: 39.29
          utc_as_z      : naïve: 38.542 |  utc: 93.27  | nyc: 57.30 | lon: 42.86
    

    This is strange because I think none of us like allow_z, including the survey participants. When told the actual semantics of the keyword argument, they overwhelmingly prefer utc_as_z:

    utc_as_z                     165
    format_utc_as_z               78
    I don't like any of these     42
    allow_z                       19
    use_utc_designator            12
    

    I was hoping the results here would be less... ambiguous.

    pganssle avatar May 03 '22 21:05 pganssle

    Given these results and the fact that people don't seem to have a great grasp on what this is supposed to do, let's push this feature to 3.12 to try to come up with a better design.

    pganssle avatar May 03 '22 22:05 pganssle

    I have to disagree about this approach:

    >>> LON = zoneinfo.ZoneInfo("Europe/London")
    >>> datetime(2022, 3, 1, tzinfo=LON).isoformat(utc_as_z=True)
    2022-03-01T00:00:00Z
    >>> datetime(2022, 6, 1, tzinfo=LON).isoformat(utc_as_z=True)
    2022-06-01T00:00:00+01:00
    

    Mainly because you may expect that your local TZ always return the 2nd format instead of the 1st one (the shortcut of +00:00) I think this should be only done when the timezone is not local but UTC only (or when missing timezone?)

    At the end what we want is to avoid this => https://i.redd.it/ocpk67fp6tx81.jpg

    Saphyel avatar May 06 '22 08:05 Saphyel

    After seeing usage in the PR, I find use_utc_designator=True a bit unwieldy, especially as it will nearly always be a keyword param.

    merwok avatar Aug 08 '23 18:08 merwok

    I suggest this is controlled in part by the tzinfo or time instance.

    For instance, in datetime.py this code in class time

    class time:
        # [...]
        def _tzstr(self):
            """Return formatted timezone offset (+xx:xx) or an empty string."""
            off = self.utcoffset()
            return _format_offset(off)
    

    can be extended as something like

    class time:
        # [...]
        def _tzstr(self):
            """Return formatted timezone offset (+xx:xx) or an empty string."""
            off = self.utcoffset()
            if not off and self.tzname() == "UTC":
                return "Z"
            return _format_offset(off)
    

    A respective change would be needed in class datetime.

    The string UTC is produced by

    class timezone(tzinfo):
        # ...
        @staticmethod
        def _name_from_offset(delta):
            if not delta:
                return 'UTC'
            # ...
    

    Now changing the default behavior of datetime formatting function to start outputting Z instead of 00:00 for UTC timestamps might cause some backward compatibility problems, so that utc_as_z=True argument might be needed. But I think the proposal utc_as_z=True is not too elegant nor useful.

    So looking at the declaration of isoformat

    class datetime:
        #....
        def isoformat(self, sep='T', timespec='auto'):
            #....
    class time:
        #....
        def isoformat(self, timespec='auto'):
            # ....
    

    Perhaps we could use a new tzspec argument to match the name of the timespec, with values 'auto', 'utcz', 'hours', 'minutes', 'seconds', etc. 'auto' would be the current behavior. 'utcz' would be the new behavior or using 'Z' instead of '00:00', and more values could be reserved to be fixed later ('hours', 'minutes', 'seconds') to control the behavior of the _format_offset() function.

    joaoe avatar Aug 08 '23 21:08 joaoe

    It should just automatically do the right thing, i.e. use Z when the timezone is UTC. This is what happens in eg. JavaScript. There's no need for a new option.

    mohd-akram avatar May 11 '24 13:05 mohd-akram

    It should just automatically do the right thing, i.e. use Z when the timezone is UTC. This is what happens in eg. JavaScript. There's no need for a new option.

    In principle, I support this. In practice, I suspect this won't be done because changing the default behaviour of isoformat would break code that depends on the UTC offset always being formatted the way it currently is.

    FeldrinH avatar May 11 '24 14:05 FeldrinH

    I object to the idea that using Z unconditionally is "the right thing", but also there are several logistical problems that make the idea of changing the default behavior a non-starter:

    1. Backwards compatibility — this alone will scuttle the proposal, because there are absolutely people relying on the fact that this outputs +00:00 instead of Z
    2. What it means to be "UTC" is ill-defined, as mentioned in the first post. Non-UTC datetimes might incidentaly have a +00:00 offset, and it isn't clear that it is appropriate to automatically give them Z, in which case we need to come up with a reliable heuristic for what it means to be "UTC". Making this behavior auto-magical will make it even harder for people to discover the edge cases.

    Whatever we do needs to be explicit, but it seems very hard to do that, since when we ran the survey people seemed to have conflicting understandings of what any of this stuff might do; there doesn't seem to be an unambiguous way to convey what the behavior might be.

    If this were a high priority and in high demand, I would suggest we might bite the bullet and have this be just one more thing that end users need to learn about datetime, but it seems like the main motivation for adding this option was to make it so that formats ending with Z would technically satisfy the pre-3.11 contract of fromisoformat (parsing any format that .isoformat can emit), which doesn't even apply anymore.

    pganssle avatar May 11 '24 17:05 pganssle

    It might be worth adding a isoformatutc method in that case. It would hit two birds with one stone - convert the time zone to UTC and provide it with the Z suffix - which is useful for machine processing. This is the same as JS's toISOString.

    mohd-akram avatar May 11 '24 17:05 mohd-akram