cpython icon indicating copy to clipboard operation
cpython copied to clipboard

gh-121021: zipfile: Support for extended_mtime attribute

Open Akasurde opened this issue 1 year ago • 4 comments

zipinfo now supports attribute - extended_mtime

fixes: #121021


📚 Documentation preview 📚: https://cpython-previews--121020.org.readthedocs.build/

  • Issue: gh-121021

Akasurde avatar Jun 26 '24 03:06 Akasurde

Info-Zip's proginfo/extrafld.txt description of this extension field has a few details that make getting all three times from the extra fields via python's ZipInfo tricky (their description is attached below)

The python implementation populates the ZipInfo data from the central directory which, according to Zip-Info's description, may only contain the modified time. Python only reads the local header (with all three fields potentially present) when opening the archived entry for reading/ decompression and that read is primarily used for some sanity checks before decompression rather than extracting additional extra fields for further use.

You could rework this PR to focus on extracting the modified time when present if that's still useful for you?

Another potential issue is The time values are in standard Unix signed-long format so will be affected by the year 2038 problem. Might not be a problem as we're just reading values...

Raising a BadZipFile in the case ln is zero (no flags to read) or decoding the time fails would be consistent with the other extra fields.

A test showing reading the field successfully would be useful. You could look to the tests for the utf filename extra fields for inspiration: https://github.com/python/cpython/blob/main/Lib/test/test_zipfile/test_core.py#L1837

         -Extended Timestamp Extra Field:
          ==============================

          The following is the layout of the extended-timestamp extra block.
          (Last Revision 19970118)

          Local-header version:

          Value         Size        Description
          -----         ----        -----------
  (time)  0x5455        Short       tag for this extra block type ("UT")
          TSize         Short       total data size for this block
          Flags         Byte        info bits
          (ModTime)     Long        time of last modification (UTC/GMT)
          (AcTime)      Long        time of last access (UTC/GMT)
          (CrTime)      Long        time of original creation (UTC/GMT)

          Central-header version:

          Value         Size        Description
          -----         ----        -----------
  (time)  0x5455        Short       tag for this extra block type ("UT")
          TSize         Short       total data size for this block
          Flags         Byte        info bits (refers to local header!)
          (ModTime)     Long        time of last modification (UTC/GMT)

          The central-header extra field contains the modification time only,
          or no timestamp at all.  TSize is used to flag its presence or
          absence.  But note:

              If "Flags" indicates that Modtime is present in the local header
              field, it MUST be present in the central header field, too!
              This correspondence is required because the modification time
              value may be used to support trans-timezone freshening and
              updating operations with zip archives.

          The time values are in standard Unix signed-long format, indicating
          the number of seconds since 1 January 1970 00:00:00.  The times
          are relative to Coordinated Universal Time (UTC), also sometimes
          referred to as Greenwich Mean Time (GMT).  To convert to local time,
          the software must know the local timezone offset from UTC/GMT.

          The lower three bits of Flags in both headers indicate which time-
          stamps are present in the LOCAL extra field:

                bit 0           if set, modification time is present
                bit 1           if set, access time is present
                bit 2           if set, creation time is present
                bits 3-7        reserved for additional timestamps; not set

          Those times that are present will appear in the order indicated, but
          any combination of times may be omitted.  (Creation time may be
          present without access time, for example.)  TSize should equal
          (1 + 4*(number of set bits in Flags)), as the block is currently
          defined.  Other timestamps may be added in the future.

danifus avatar Jun 26 '24 13:06 danifus

@danifus Thanks a lot for the review. I updated the PR with your suggestions.

Akasurde avatar Jun 26 '24 17:06 Akasurde

@danifus Anything else required over here? Thanks.

Akasurde avatar Jul 02 '24 23:07 Akasurde

No more changes from me :) Nice work

LGTM

danifus avatar Jul 03 '24 00:07 danifus

@danifus Do I need to do anything to get this merged? Thanks.

Akasurde avatar Aug 17 '24 02:08 Akasurde

You'll need to get a review from a core developer and then they can help get it merged (I also have a few waiting :p ). Have a read of the experts index https://devguide.python.org/core-developers/experts/index.html - there maybe someone in there that could review it for you.

danifus avatar Aug 20 '24 00:08 danifus

@serhiy-storchaka, @Yhg1s, @gpshead Could you please review this PR? Thanks in advance.

Akasurde avatar Aug 21 '24 16:08 Akasurde