gh-121021: zipfile: Support for extended_mtime attribute
zipinfo now supports attribute - extended_mtime
fixes: #121021
📚 Documentation preview 📚: https://cpython-previews--121020.org.readthedocs.build/
- Issue: gh-121021
Info-Zip's proginfo/extrafld.txt description of this extension field has a few details that make getting all three times from the extra fields via python's ZipInfo tricky (their description is attached below)
The python implementation populates the ZipInfo data from the central directory which, according to Zip-Info's description, may only contain the modified time. Python only reads the local header (with all three fields potentially present) when opening the archived entry for reading/ decompression and that read is primarily used for some sanity checks before decompression rather than extracting additional extra fields for further use.
You could rework this PR to focus on extracting the modified time when present if that's still useful for you?
Another potential issue is The time values are in standard Unix signed-long format so will be affected by the year 2038 problem. Might not be a problem as we're just reading values...
Raising a BadZipFile in the case ln is zero (no flags to read) or decoding the time fails would be consistent with the other extra fields.
A test showing reading the field successfully would be useful. You could look to the tests for the utf filename extra fields for inspiration: https://github.com/python/cpython/blob/main/Lib/test/test_zipfile/test_core.py#L1837
-Extended Timestamp Extra Field:
==============================
The following is the layout of the extended-timestamp extra block.
(Last Revision 19970118)
Local-header version:
Value Size Description
----- ---- -----------
(time) 0x5455 Short tag for this extra block type ("UT")
TSize Short total data size for this block
Flags Byte info bits
(ModTime) Long time of last modification (UTC/GMT)
(AcTime) Long time of last access (UTC/GMT)
(CrTime) Long time of original creation (UTC/GMT)
Central-header version:
Value Size Description
----- ---- -----------
(time) 0x5455 Short tag for this extra block type ("UT")
TSize Short total data size for this block
Flags Byte info bits (refers to local header!)
(ModTime) Long time of last modification (UTC/GMT)
The central-header extra field contains the modification time only,
or no timestamp at all. TSize is used to flag its presence or
absence. But note:
If "Flags" indicates that Modtime is present in the local header
field, it MUST be present in the central header field, too!
This correspondence is required because the modification time
value may be used to support trans-timezone freshening and
updating operations with zip archives.
The time values are in standard Unix signed-long format, indicating
the number of seconds since 1 January 1970 00:00:00. The times
are relative to Coordinated Universal Time (UTC), also sometimes
referred to as Greenwich Mean Time (GMT). To convert to local time,
the software must know the local timezone offset from UTC/GMT.
The lower three bits of Flags in both headers indicate which time-
stamps are present in the LOCAL extra field:
bit 0 if set, modification time is present
bit 1 if set, access time is present
bit 2 if set, creation time is present
bits 3-7 reserved for additional timestamps; not set
Those times that are present will appear in the order indicated, but
any combination of times may be omitted. (Creation time may be
present without access time, for example.) TSize should equal
(1 + 4*(number of set bits in Flags)), as the block is currently
defined. Other timestamps may be added in the future.
@danifus Thanks a lot for the review. I updated the PR with your suggestions.
@danifus Anything else required over here? Thanks.
No more changes from me :) Nice work
LGTM
@danifus Do I need to do anything to get this merged? Thanks.
You'll need to get a review from a core developer and then they can help get it merged (I also have a few waiting :p ). Have a read of the experts index https://devguide.python.org/core-developers/experts/index.html - there maybe someone in there that could review it for you.
@serhiy-storchaka, @Yhg1s, @gpshead Could you please review this PR? Thanks in advance.