setuptools icon indicating copy to clipboard operation
setuptools copied to clipboard

support for SOURCE_DATE_EPOCH in sdist.

Open Carreau opened this issue 5 years ago • 9 comments
trafficstars

SOURCE_DATE_EPOCH is useful for reproducible build, when set, no timestamp should be greater than this value.

It seem that setuptools sdist does not support SOURCE_DATE_EPOCH, I've traced it to the following:

sdit inherit from Commands, which leads to these successives calls.

Lib/distutils/cmd.py:Command.make_archive
Lib/distutils/archive_util.py:make_archive
Lib/distutils/archive_util.py:ARCHIVE_FORMATS
Lib/distutils/archive_util.py:make_tarball

Make tarball seem to be the right place to monkeypatch to look for SOURCE_DATE_EPOCH as it itself can pass a filter to tarfile.add(), which will ensure the mtime is bounded (it already pass a filter to set uid/gid).

With this most sdist (except tgz) are reproducibles. TGZ has this last problem that GzipFile adds time.time() in the header and that's a bit harder to patch.

Carreau avatar May 24 '20 18:05 Carreau

There's some excellent work towards this started in #2136, thanks @Carreau! Are you planning to pick this up? If not, perhaps I could help finish up this work?

We would like to be able to produce reproducible sdists for python-tuf. (Curious readers can see: https://github.com/theupdateframework/tuf/issues/1269)

joshuagl avatar Feb 09 '21 12:02 joshuagl

Are you planning to pick this up? If not, perhaps I could help finish up this work?

At some point; but I don't have much time these days; feel free to take over.

Carreau avatar Feb 09 '21 17:02 Carreau

I'm interested in reproducible sdists, too. Reproducible artifacts make it much easier to verify the provenance of code.

  • BPO https://bugs.python.org/issue31526 is the Python bug report for gzip timestamping issue.
  • you can find my working implementation of reproducible tar.bz2 at https://src.fedoraproject.org/rpms/python-cryptography/blob/rawhide/f/vendor_rust.py

tiran avatar Mar 18 '21 07:03 tiran

Just in case this is useful to others, I paste below a self-contained hunk of monkeypatching that allowed me to get reproducible (same sha256 hash) sdist tarballs. This hunk of code can be dumped in setup.py, for example.

# Support for Reproducible Builds
# https://reproducible-builds.org/docs/source-date-epoch/

timestamp = os.environ.get('SOURCE_DATE_EPOCH')
if timestamp is not None:
    import distutils.archive_util as archive_util
    import stat
    import tarfile
    import time

    timestamp = float(max(int(timestamp), 0))

    class Time:
        @staticmethod
        def time():
            return timestamp
        @staticmethod
        def localtime(_=None):
            return time.localtime(timestamp)

    class TarInfoMode:
        def __get__(self, obj, objtype=None):
            return obj._mode
        def __set__(self, obj, stmd):
            ifmt = stat.S_IFMT(stmd)
            mode = stat.S_IMODE(stmd) & 0o7755
            obj._mode = ifmt | mode

    class TarInfoAttr:
        def __init__(self, value):
            self.value = value
        def __get__(self, obj, objtype=None):
            return self.value
        def __set__(self, obj, value):
            pass

    class TarInfo(tarfile.TarInfo):
        mode = TarInfoMode()
        mtime = TarInfoAttr(timestamp)
        uid = TarInfoAttr(0)
        gid = TarInfoAttr(0)
        uname = TarInfoAttr('')
        gname = TarInfoAttr('')

    def make_tarball(*args, **kwargs):
        tarinfo_orig = tarfile.TarFile.tarinfo
        try:
            tarfile.time = Time()
            tarfile.TarFile.tarinfo = TarInfo
            return archive_util.make_tarball(*args, **kwargs)
        finally:
            tarfile.time = time
            tarfile.TarFile.tarinfo = tarinfo_orig

    archive_util.ARCHIVE_FORMATS['gztar'] = (
        make_tarball, *archive_util.ARCHIVE_FORMATS['gztar'][1:],
    )

A few explanations follow:

  1. The timestamp value has to be converted to float. Keeping it int will not go right, and the final tarball will miss the PAX header. This is because the code in tarfile assumes/expect mtime to be a float.
  2. I had to replace tarball.time to prevent current timestamp being injected in the compressed gzip stream.
  3. The monkeypatch of TarInfo.mode may be not strictly necessary, but helps with different umask user settings.
  4. The username/groupname and userid/groupid information stored in the tarball has to be overridden. To keep things simple and generic enough ~I just picked user/users and 1000/100.~ and following the recommendation from @haampie in the comment below, the username/groupname are set as the empty string and userid/groupid are set to zero.

PS: Maybe this approach is simple enough to incorporate into setuptools?

dalcinl avatar Aug 24 '23 07:08 dalcinl

@dalcinl thanks that is great !

Carreau avatar Aug 28 '23 07:08 Carreau

Better to use uid = gid = 0, and set uname/gname to empty string.

Otherwise you're in for fun surprises when extracting the tarball as root on systems that have uid/gid 1000.

In particular files that are executable only by the user would now be executable by this 1000 user, that can be a security issue.

Python itself notably tries to change ownership in tarfile: https://github.com/python/cpython/blob/2bbbab212fb10b3aeaded188fb5d6c001fb4bf74/Lib/tarfile.py#L2530

haampie avatar Oct 05 '23 12:10 haampie

Better to use uid = gid = 0, and set uname/gname to empty string.

I've updated the code snippet as per your recommendation. Thanks.

dalcinl avatar Oct 05 '23 12:10 dalcinl

The snippet from @dalcinl is very helpful, thanks!

If you don't have a setup.py at all, for those who normally use pyproject.toml, I've adapted the idea into a build backend at https://github.com/wimglenn/setuptools-reproducible, which can be used like this:

[build-system]
requires = ["setuptools-reproducible"]
build-backend = "setuptools_reproducible"

Some notes:

  • This wraps setuptools, and otherwise behaves identically as the setuptools.build_meta backend.
  • I did not bother to patch distutils.archive_util, we can just patch tarfile module directly. With PEP 517 we're already working in an isolated environment when the build backend hooks are called.
  • The patch of localtime seems unnecessary, it's only used by the TarFile.list method which is not needed at build time. I've left it out.

It's tested on {macOs, Linux, Windows} x Py-3.{8,9,10,11,12}. It does not work in Python-3.7 (and I did not bother to investigate why, since 3.7 is EOL now).

wimglenn avatar May 15 '24 02:05 wimglenn

FTR here's a hack Ansible uses to make sdists reproducible: https://github.com/ansible/ansible/blob/03d6209/packaging/release.py#L867-L899.

webknjaz avatar Jan 23 '25 04:01 webknjaz