setuptools
setuptools copied to clipboard
support for SOURCE_DATE_EPOCH in sdist.
SOURCE_DATE_EPOCH is useful for reproducible build, when set, no timestamp should be greater than this value.
It seem that setuptools sdist does not support SOURCE_DATE_EPOCH, I've traced it to the following:
sdit inherit from Commands, which leads to these successives calls.
Lib/distutils/cmd.py:Command.make_archive
Lib/distutils/archive_util.py:make_archive
Lib/distutils/archive_util.py:ARCHIVE_FORMATS
Lib/distutils/archive_util.py:make_tarball
Make tarball seem to be the right place to monkeypatch to look for SOURCE_DATE_EPOCH as it itself can pass a filter to tarfile.add(), which will ensure the mtime is bounded (it already pass a filter to set uid/gid).
With this most sdist (except tgz) are reproducibles. TGZ has this last problem that GzipFile adds time.time() in the header and that's a bit harder to patch.
There's some excellent work towards this started in #2136, thanks @Carreau! Are you planning to pick this up? If not, perhaps I could help finish up this work?
We would like to be able to produce reproducible sdists for python-tuf. (Curious readers can see: https://github.com/theupdateframework/tuf/issues/1269)
Are you planning to pick this up? If not, perhaps I could help finish up this work?
At some point; but I don't have much time these days; feel free to take over.
I'm interested in reproducible sdists, too. Reproducible artifacts make it much easier to verify the provenance of code.
- BPO https://bugs.python.org/issue31526 is the Python bug report for gzip timestamping issue.
- you can find my working implementation of reproducible
tar.bz2at https://src.fedoraproject.org/rpms/python-cryptography/blob/rawhide/f/vendor_rust.py
Just in case this is useful to others, I paste below a self-contained hunk of monkeypatching that allowed me to get reproducible (same sha256 hash) sdist tarballs. This hunk of code can be dumped in setup.py, for example.
# Support for Reproducible Builds
# https://reproducible-builds.org/docs/source-date-epoch/
timestamp = os.environ.get('SOURCE_DATE_EPOCH')
if timestamp is not None:
import distutils.archive_util as archive_util
import stat
import tarfile
import time
timestamp = float(max(int(timestamp), 0))
class Time:
@staticmethod
def time():
return timestamp
@staticmethod
def localtime(_=None):
return time.localtime(timestamp)
class TarInfoMode:
def __get__(self, obj, objtype=None):
return obj._mode
def __set__(self, obj, stmd):
ifmt = stat.S_IFMT(stmd)
mode = stat.S_IMODE(stmd) & 0o7755
obj._mode = ifmt | mode
class TarInfoAttr:
def __init__(self, value):
self.value = value
def __get__(self, obj, objtype=None):
return self.value
def __set__(self, obj, value):
pass
class TarInfo(tarfile.TarInfo):
mode = TarInfoMode()
mtime = TarInfoAttr(timestamp)
uid = TarInfoAttr(0)
gid = TarInfoAttr(0)
uname = TarInfoAttr('')
gname = TarInfoAttr('')
def make_tarball(*args, **kwargs):
tarinfo_orig = tarfile.TarFile.tarinfo
try:
tarfile.time = Time()
tarfile.TarFile.tarinfo = TarInfo
return archive_util.make_tarball(*args, **kwargs)
finally:
tarfile.time = time
tarfile.TarFile.tarinfo = tarinfo_orig
archive_util.ARCHIVE_FORMATS['gztar'] = (
make_tarball, *archive_util.ARCHIVE_FORMATS['gztar'][1:],
)
A few explanations follow:
- The timestamp value has to be converted to
float. Keeping itintwill not go right, and the final tarball will miss the PAX header. This is because the code intarfileassumes/expectmtimeto be afloat. - I had to replace
tarball.timeto prevent current timestamp being injected in the compressed gzip stream. - The monkeypatch of
TarInfo.modemay be not strictly necessary, but helps with different umask user settings. - The username/groupname and userid/groupid information stored in the tarball has to be overridden. To keep things simple and generic enough ~I just picked user/users and 1000/100.~ and following the recommendation from @haampie in the comment below, the username/groupname are set as the empty string and userid/groupid are set to zero.
PS: Maybe this approach is simple enough to incorporate into setuptools?
@dalcinl thanks that is great !
Better to use uid = gid = 0, and set uname/gname to empty string.
Otherwise you're in for fun surprises when extracting the tarball as root on systems that have uid/gid 1000.
In particular files that are executable only by the user would now be executable by this 1000 user, that can be a security issue.
Python itself notably tries to change ownership in tarfile: https://github.com/python/cpython/blob/2bbbab212fb10b3aeaded188fb5d6c001fb4bf74/Lib/tarfile.py#L2530
Better to use uid = gid = 0, and set uname/gname to empty string.
I've updated the code snippet as per your recommendation. Thanks.
The snippet from @dalcinl is very helpful, thanks!
If you don't have a setup.py at all, for those who normally use pyproject.toml, I've adapted the idea into a build backend at https://github.com/wimglenn/setuptools-reproducible, which can be used like this:
[build-system]
requires = ["setuptools-reproducible"]
build-backend = "setuptools_reproducible"
Some notes:
- This wraps setuptools, and otherwise behaves identically as the
setuptools.build_metabackend. - I did not bother to patch
distutils.archive_util, we can just patchtarfilemodule directly. With PEP 517 we're already working in an isolated environment when the build backend hooks are called. - The patch of
localtimeseems unnecessary, it's only used by the TarFile.list method which is not needed at build time. I've left it out.
It's tested on {macOs, Linux, Windows} x Py-3.{8,9,10,11,12}. It does not work in Python-3.7 (and I did not bother to investigate why, since 3.7 is EOL now).
FTR here's a hack Ansible uses to make sdists reproducible: https://github.com/ansible/ansible/blob/03d6209/packaging/release.py#L867-L899.