cpython
cpython copied to clipboard
zipfile regression in 3.11 beta: ignores utf8 flag and removes it from existing entries
Bug report
apksigcopier CI started failing on 3.11 beta because of this seemingly unnecessary change in #32007 (which added fallback encoding support):
def _encodeFilenameFlags(self):
try:
- return self.filename.encode('ascii'), self.flag_bits
+ return self.filename.encode('ascii'), self.flag_bits & ~_MASK_UTF_FILENAME
except UnicodeEncodeError:
which results in:
- the utf8 flag in
ZipInfo.flag_bits(whether set manually or from an existing entry that already had it) being completely ignored (i.e. set/unset based on whether the file name can be encoded in ascii or requires utf8, regardless of the current value) when file headers and the central directory are written, whereas before it was already being set when required but not unset if not required (and having ascii filenames with the utf8 flag set is completely fine and common, but now impossible); - resultingly, removal of the utf8 flag from existing zip entries (with names that can be encoded in ascii but have it set nonetheless) in the central directory (though presumably not from the individual file headers for unmodified existing entries, making them inconsistent with the central directory), e.g. when appending to an existing zip file.
Which resulted in zip files that were no longer bitwise identical, making the existing lack of support for creating reproducible/deterministic zip files even worse than before.
Please revert this change; I'm happy to make a PR for that.
Your environment
- CPython versions tested on: 3.11.0-beta.5 (Ubuntu 20.04.4, GitHub actions), 3.11.0-beta.4 (Debian testing/unstable)
- Operating system and architecture: see above (both amd64)
NB: this regression also affects f-droid (which uses apksigcopier for reproducible builds).
cc @eighthave
CC: @gpshead as you reviewed the PR.
Unless someone has a good reason not to, I plan to revert b5cf7374d79aae191c1e38d0959527e0bbb7a95e0df5b125a8b1056cc2c54851L483 tomorrow.
@obfusk Thanks for the very detailed report ❤️
Yes, good writeup. I'm fine with a revert of PR #32007's a25a985535ccbb7df8caddc0017550ff4eae5855. (I think you mispasted something else that isn't a commit id in your message pablogsal)
It seems more is needed here in order to preserve existing bits.
Oh I see, you were linking just to the particular one line of the change. :) If undoing just line itself is the fix, great. It sounds like we don't have test coverage for this situation though.
Oh I see, you were linking just to the particular one line of the change. :) If undoing just line itself is the fix, great. It sounds like we don't have test coverage for this situation though.
Yes, reverting just the single line changed in _encodeFilenameFlags (the diff in my report above) should be the fix.
I can confirm that monkey-patching _encodeFilenameFlags to the code from before this change makes apksigcopier CI succeed again.
Of course, adding additional test coverage to prevent regressions like this in the future would certainly be nice IMO.
Thanks for getting this fixed before 3.11 is released!
This seems to be fixed, so I am closing it. Please re-open you think we missed something.
apksigcopier CI is green again with 3.11rc2 and the workaround removed, so seems to be fixed — as expected — indeed.
Thanks!