cpython icon indicating copy to clipboard operation
cpython copied to clipboard

zipfile regression in 3.11 beta: ignores utf8 flag and removes it from existing entries

Open obfusk opened this issue 2 years ago • 6 comments

Bug report

apksigcopier CI started failing on 3.11 beta because of this seemingly unnecessary change in #32007 (which added fallback encoding support):

     def _encodeFilenameFlags(self):
         try:
-            return self.filename.encode('ascii'), self.flag_bits
+            return self.filename.encode('ascii'), self.flag_bits & ~_MASK_UTF_FILENAME
         except UnicodeEncodeError:

which results in:

  • the utf8 flag in ZipInfo.flag_bits (whether set manually or from an existing entry that already had it) being completely ignored (i.e. set/unset based on whether the file name can be encoded in ascii or requires utf8, regardless of the current value) when file headers and the central directory are written, whereas before it was already being set when required but not unset if not required (and having ascii filenames with the utf8 flag set is completely fine and common, but now impossible);
  • resultingly, removal of the utf8 flag from existing zip entries (with names that can be encoded in ascii but have it set nonetheless) in the central directory (though presumably not from the individual file headers for unmodified existing entries, making them inconsistent with the central directory), e.g. when appending to an existing zip file.

Which resulted in zip files that were no longer bitwise identical, making the existing lack of support for creating reproducible/deterministic zip files even worse than before.

Please revert this change; I'm happy to make a PR for that.

Your environment

  • CPython versions tested on: 3.11.0-beta.5 (Ubuntu 20.04.4, GitHub actions), 3.11.0-beta.4 (Debian testing/unstable)
  • Operating system and architecture: see above (both amd64)

obfusk avatar Jul 30 '22 12:07 obfusk

NB: this regression also affects f-droid (which uses apksigcopier for reproducible builds).

cc @eighthave

obfusk avatar Aug 15 '22 18:08 obfusk

CC: @gpshead as you reviewed the PR.

Unless someone has a good reason not to, I plan to revert b5cf7374d79aae191c1e38d0959527e0bbb7a95e0df5b125a8b1056cc2c54851L483 tomorrow.

pablogsal avatar Aug 15 '22 22:08 pablogsal

@obfusk Thanks for the very detailed report ❤️

pablogsal avatar Aug 15 '22 22:08 pablogsal

Yes, good writeup. I'm fine with a revert of PR #32007's a25a985535ccbb7df8caddc0017550ff4eae5855. (I think you mispasted something else that isn't a commit id in your message pablogsal)

It seems more is needed here in order to preserve existing bits.

gpshead avatar Aug 15 '22 22:08 gpshead

Oh I see, you were linking just to the particular one line of the change. :) If undoing just line itself is the fix, great. It sounds like we don't have test coverage for this situation though.

gpshead avatar Aug 15 '22 22:08 gpshead

Oh I see, you were linking just to the particular one line of the change. :) If undoing just line itself is the fix, great. It sounds like we don't have test coverage for this situation though.

Yes, reverting just the single line changed in _encodeFilenameFlags (the diff in my report above) should be the fix.

I can confirm that monkey-patching _encodeFilenameFlags to the code from before this change makes apksigcopier CI succeed again.

Of course, adding additional test coverage to prevent regressions like this in the future would certainly be nice IMO.

Thanks for getting this fixed before 3.11 is released!

obfusk avatar Aug 16 '22 01:08 obfusk

This seems to be fixed, so I am closing it. Please re-open you think we missed something.

pablogsal avatar Sep 05 '22 22:09 pablogsal

apksigcopier CI is green again with 3.11rc2 and the workaround removed, so seems to be fixed — as expected — indeed.

Thanks!

obfusk avatar Sep 24 '22 21:09 obfusk