auditwheel icon indicating copy to clipboard operation
auditwheel copied to clipboard

Deterministically normalize wheel ZIP metadata

Open tabbyrobin opened this issue 8 months ago • 1 comments

So as to enable reproducible builds, it would be nice if auditwheel could deterministically normalize the ZIP metadata in wheels it repairs.

It would be ideal if this was either done by default, or easily enabled with a simple option (for example, --deterministic or similar).

Sources of nondeterminism in ZIP metadata include:

  • Timestamps -- Of course, auditwheel already normalizes ZIP timestamps if SOURCE_DATE_EPOCH is set. However, it would be helpful to normalize the timestamps even if SOURCE_DATE_EPOCH is not set.
  • File permissions and ownership
  • Ordering of ZIP entries. -- This generally seems to already be effectively deterministic, at least with the handful of experiments I've run. I'm not sure what nondeterminism might exist depending on OS, OS variants, Python build backends, etc.
  • Potentially other things

I am filing this issue here with auditwheel in the hopes that a solution in auditwheel could serve as a blanket solution in a centralized tool, improving wheel reproducibility through the Python ecosystem.

See also:

  • https://github.com/pypa/cibuildwheel/issues/2344
  • https://github.com/pyca/cryptography/issues/12811

tabbyrobin avatar Apr 27 '25 07:04 tabbyrobin

I have put together some notes about deterministic Python wheels: https://github.com/tabbyrobin/expt-repro-python-wheels/blob/main/notes-on-wheel-determinism.md

And I started a thread about the subject here, to hopefully spark discussion with various projects/the wider Python community: https://discuss.python.org/t/best-practices-for-deterministically-normalizing-wheel-zip-metadata/90662

tabbyrobin avatar May 04 '25 05:05 tabbyrobin