cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Add ISO Basic format support to datetime.isoformat() and date.isoformat()

Open mohd-akram opened this issue 1 year ago • 2 comments

Feature or enhancement

Proposal:

In additional to the popular ISO 8601 Extended format, there's also an ISO 8601 Basic format for datetimes which is useful for filenames and URL components as it avoids characters such as eg. colon and is more compact. datetime.fromisoformat already supports parsing this format.

Example code:

datetime.isoformat(basic=True)
# 20240422T204705.335-0400
date.isoformat(basic=True)
# 20240422

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

mohd-akram avatar May 11 '24 13:05 mohd-akram

Does this need dicussion on Discourse, or is the issue minor enough?

nineteendo avatar May 11 '24 20:05 nineteendo

I think that it's a good idea to support formatting in the basic format (that I just discovered).

cc @pganssle @abalkin

vstinner avatar May 20 '24 21:05 vstinner

Should it be basic with default value False or extended with default value True?

datetime.isoformat() has parameter sep which specifies the separator between date and time. Taking it as a precedence, we can add similar parameters for separators between components in a date and a time. sep currently can only be a character, it should support also an empty string.

On other hand, adding parameters to .isoformat() is not the only way to solve this problem. You can also use.strftime() or str.replace().

serhiy-storchaka avatar Jun 15 '24 18:06 serhiy-storchaka

I am maybe -0.5 on this feature. There is a case for putting stuff in isoformat if people are usually going to want automatic truncation, but the in the use cases put forward like filenames, you would almost certainly prefer a fixed format, so strftime(..., "%Y%M%DT%h%m%s.ext") seems like it would actually be better than this.

Taking it as a precedence, we can add similar parameters for separators between components in a date and a time. sep currently can only be a character, it should support also an empty string

We should definitely not do this, because ISO8601 makes no provision for arbitrary separators, and to the extent that sep is even allowed to be something other than T, I'm fairly confident that you are not allowed to omit it entirely.

pganssle avatar Jun 16 '24 01:06 pganssle

First of all, note that my comments are based on ISO 8601:2004 which is superseeded by 8601:2019, which I need to buy (but I won't). I nevertheless assume that the informative parts remain the same (namely sections 1 and 2).

Should it be basic with default value False or extended with default value True?

ISO 8601:2004 section 2.3.3 says The basic format should be avoided in plain text. For years, isoformat() assumed the extended format and thus, having a flag for explicitly enabling the basic format is preferrable (basic=True disables the extended format and explicitly switches to a basic format). With extended=False, we implicitly switches to the basic format by disabling the extended one.

so strftime(..., "%Y%M%DT%h%m%s.ext") seems like it would actually be better than this.

In this case, I would agree but this is not exactly the same as having the basic format as specified by ISO 8601. Now, while I did suggest a PR for the basic format (and would be happy it was accepted), I'm actually wondering it is really needed in the end. For instance, the date command does not propose to output the basic format by default but allows to input it, so it could also make sense that we do not want to do it either (you can still output a basic format but you need to make it yourself, e.g., date +'%H%M%S').

picnixz avatar Jun 17 '24 08:06 picnixz

@pganssle:

There is a case for putting stuff in isoformat if people are usually going to want automatic truncation

What do you mean by automatic truncation? The idea is to add an opt-in format basic=True, by default nothing is changed. Did I miss something?

vstinner avatar Jun 17 '24 09:06 vstinner

What do you mean by automatic truncation?

When timespec is set to auto (the default), if a datetime doesn't have sub-second components, they will be excluded from the output; this, and the difference in how time zones are handled, are some of the main reasons why isoformat isn't just syntactic sugar for some strftime format:

>>> dts = [datetime(2024, 3, 7, 12, 15, 30, 123456),
           datetime(2024, 4, 9, 13),
           datetime(2024, 5, 1, 16, 30, 2, 456123, tzinfo=timezone(timedelta(hours=5))),
           datetime(2024, 6, 1, 16, 15, tzinfo=timezone(timedelta(hours=5, minutes=3, seconds=14)))]
>>> for dt in dts:
...     print(dt.isoformat())
... 
2024-03-07T12:15:30.123456
2024-04-09T13:00:00
2024-05-01T16:30:02.456123+05:00
2024-06-01T16:15:00+05:03:14
>>> for dt in dts:
...     print(dt.strftime("%Y-%m-%dT%H:%M:%S.%f%z"))
2024-03-07T12:15:30.123456
2024-04-09T13:00:00.000000
2024-05-01T16:30:02.456123+0500
2024-06-01T16:15:00.000000+050314

The main reasons to use .isoformat is if you want this sort of truncation to happen, or because you prefer the simplicity of "just give me a datetime that complies with this standard". The more we complicate isoformat, that more it basically becomes strftime, and it gets bogged down in complexity.

I don't think we should automatically say isoformat should never change or grow new options, but the reasoning here is not particularly compelling, because it's suggesting an opt-in format with a name that most people won't understand where the primary motivating use case not only can be replaced by an strftime call, but arguably should be replaced by an stftime call because:

  1. It is easier to parse — both versions can be parsed by .fromisoformat, but only the stftime version can be parsed by strptime ("oops, this datetime happened to have 0 for the microsecond component and now I need a different parse format!)
  2. People reading the code will know immediately what the format is if you explicitly write it out in strftime, whereas they may not know what isoformat(basic=True) does, or what corner cases apply.
  3. For file names, you probably prefer them to have a consistent file name rather than a "pretty display" file name.

I suppose you could use dt.isoformat(timespec='seconds', basic=True) to alleviate concerns 1 and 3, but that still leaves concern 2.

pganssle avatar Jun 18 '24 01:06 pganssle

How about dt.isoformat(timespec='seconds', short=True)? That's use case oriented.

short: 20240601T161500.000000
long:  2024-06-01T16:15:00.000000

nineteendo avatar Jun 18 '24 05:06 nineteendo

The term basic is the term in ISO standards and shoud be left as is IMO (if we were to support it).

picnixz avatar Jun 18 '24 07:06 picnixz

I agree with @picnixz, the name here is not the problem. If the survey on the API for outputting Z is any guide, it is really hard to do something unambiguous. basic=True is almost certainly the best you can do, because it is the standard term for it so it is probably unambiguous, and worst case scenario you can google that term.

That said, almost everyone will have to google that term. I have read ISO 8601 several times, and I implemented two mostly full-featured ISO 8601 parsers, and I had to look up the term to see if it was an official term. No one is going to know what short=True does without looking it up or reading the docs. basic is definitely the best term for this, and it will undoubtedly create cognitive load relative to an explicitly specified format.

I think the main blocker here is that there's no compelling use case (and there actually kind of is a compelling use case for #90772, and we still didn't do that one because we couldn't come up with a non-confusing UX for it).

pganssle avatar Jun 18 '24 18:06 pganssle

The motivation for the ISO basic format is the same as the extended format - that it is a standardized machine-readable format that ensures seamless interoperability. You do not get that with many potentially subtly incorrect strftime/strptime implementations, as doing it right requires reading and implementing the spec correctly. That machinery is already implemented in Python, and you can also specify your desired granularity with timespec. Doing this manually would require creating strftime/strptime pairs for each case.

it will undoubtedly create cognitive load relative to an explicitly specified format.

IMO, unless one has the specification table memorized, I don't think "ISO but without - and :" would be more of a cognitive load than figuring out what %Y%M%DT%h%m%s.ext (which is subtly wrong) does.

mohd-akram avatar Jun 18 '24 20:06 mohd-akram

Should it be basic with default value False or extended with default value True?

Or format: datetime.Format = datetime.Format.extended?

antonagestam avatar Jun 26 '24 11:06 antonagestam