cpython icon indicating copy to clipboard operation
cpython copied to clipboard

`zipfile`: docs should document what `ZipInfo.date_time` actually is

Open calestyo opened this issue 8 months ago • 2 comments

Documentation

Currently, the documentation for ZipInfo.date_time is:

The time and date of the last modification to the archive member. This is a tuple of six values: ... Note: The ZIP file format does not support timestamps before 1980.

This should be extended/clarified to define which date/time is meant (which currently is only hinted by the does not support timestamps before 1980.

The ZIP format supports for multiple timestamps:

  • there's the last mod file time and last mod file date in the central directory (see chapters 4.3.7 and 4.4.6)
  • but also various times (e.g. for NTFS and UNIX) as part of the extra fields (see e.g. chapter 4.5.7).

The latter may e.g. support times before 1980, higher resolutions and time zones.

Right now it seems not strictly defined what .date_time actually returns - from what I can see from the code, it's the one from the central directory - but that could in principle be just an implementation detail and a future version of zipfile could e.g. use the extra fields if present.

Thus it would be nice, if the documentation could specify, whether .date_time is always meant to be the time from the central directory, or whether this could change.

Thanks, Chris.

Linked PRs

  • gh-136082

calestyo avatar Apr 29 '25 02:04 calestyo

Can I work on this?

LordGvozd avatar Apr 29 '25 10:04 LordGvozd

Sure, go ahead! :)

sobolevn avatar Apr 29 '25 10:04 sobolevn

Can I work on this?

@LordGvozd Sorry, are you still working on this? Didn't realize it 🙏

KentaroJay avatar Jul 03 '25 15:07 KentaroJay

Should we also add some more information about:

  • Year upper limit (2107). This could be added to the table describing the values in the tuple.

  • Seconds is stored with 2 second precision When creating a zipfile entry, date_time is initially sourced via time.localtime(st_mtime) and may have an odd number of seconds before it is written out as dt[5] // 2. Given that this value is not always at a resolution of 2 seconds, I don't think we should mention that seconds has a 2 second resolution in that table but perhaps mention that it has a 2 second resolution when written or read. Perhaps we mention this in or after the paragraph about local time vs UTC?

Also, separately, should we change the code so that date_time only ever has a 2 second precision? It would allow round trips to work more than 50% of the time? :p

danifus avatar Sep 03 '25 04:09 danifus

I don't think the Python docs should be in the business of overly documenting the zip format, and I worry describing the DOS date/time format in detail is doing exactly that.

Thinking about this a bit more, I think it would be better to say that the CD date and time are in DOS format, perhaps linking to https://learn.microsoft.com/en-us/windows/win32/sysinfo/ms-dos-date-and-time. Then give one or two examples of limitations (e.g. cannot express before 1980, 2 second precision)

emmatyping avatar Oct 20 '25 04:10 emmatyping