flit
flit copied to clipboard
Compute SOURCE_DATE_EPOCH automatically (e.g., from git)
Hi @takluyver !
I was reading https://flit.readthedocs.io/en/latest/reproducible.html , and I was excited to see that reproducible builds are something you've considered with flit. I wondered if there was a way to make reproducibility automatic.
Toward that goal, I'm interested in working with you to design a user flow where flit can compute the SOURCE_DATE_EPOCH from git. I'm also interested in submitting a patch that implements this, with an approximate ETA of 2-3 weeks from when we decide how it should work, if that's helpful. I also have some other questions about this, which you'll find below.
I can think of a few implementation strategies:
-
flitcould automatically check thegit logfor the most recent commit, and use the timestamp as theSOURCE_DATE_EPOCH, if this is a git repository (using the same logic asflitalready uses). -
flit publish(and/orflit build) could accept a flag called e.g.--source-date-from-vcs, so that users have to opt-in to this behavior. -
flit publish(and/orflit build) could accept a flag called--use-timestamps, in which case, we use the current behavior of embedding timestamps (subject toSOURCE_DATE_EPOCH). If that flag is omitted, we use some zero timestamp as the timestamp in wheels and tar files.
I have a few related questions:
-
The doc at https://flit.readthedocs.io/en/latest/reproducible.html seems to imply that a
sdistdoesn't depend on the current time. My experiments show that thesdistbuilt byflit builddo contain timestamps of files. Should I change that doc? Happy to do so. (Context: I didflit buildon a clone of this repo, then didtar -t -v --full-time -f dist/flit-2.2.0.tar.gz | grep 2020-01-31 | head -n1to confirm that I got a value from today, even though the repo is unchanged since a couple of weeks ago.) -
flitis embedding a timestamp of 2016-01-01 for its generated files in the wheel https://github.com/takluyver/flit/blame/93482e44d804b80c770203ec6c2e375170ece9ef/flit_core/flit_core/wheel.py#L173 but not for the user's code. Would it be better and simpler to use this 2016-01-01 timestamp for user code, too? This would also address my goal here, I believe. :) -
If we go ahead with the
git-based implementation strategy, would you want this to work equally well forhgas part of merging the patch? If so, I can commit to adding that to my patch before submitting it for your final review.
Happy to chat more, either here or somewhere lower-latency if you desire.
If this is a direction you're not interested in, it's okay if you close this. :)
Cheers!
Hi Asheesh! Good to hear from you again. :-)
Yes, I think getting the source date from the most recent git commit sounds like a sensible thing to do. I'm still considering whether I want it to be the default behaviour. I'm leaning towards yes, because the timestamps from the files are probably not much use, but I'm not 100% convinced yet.
I only did the reproducibility work for wheels, not sdists. If Python packaging gains more general infrastructure for reproducible builds, I imagine wheels will be the focus, because that's where compiled artifacts can be. Sdists are meant to be source, so it should be easier to diff the unpacked files against e.g. a source tree from git (I've also been thinking about tools to make that easy). So I think reproducible sdists are less valuable. But if you want them and it doesn't make the code significantly harder to look at, I probably won't object.
Re 2016-01-01: I'm reluctant to overwrite real information (the file timestamps) with something entirely meaningless. It's a different calculation for the files flit generates, because they would otherwise have a different timestamp each time you flit build. Overwriting them with VCS timestamps seems more reasonable than with an arbitrary date.
I don't think we need an equivalent hg implementation in the first instance - it's a nice enhancement, but not an essential feature, so doing it with other VCSs can be left for another day.
Great!
I'll just petition you once more for "don't store the dates in the build artifact". :) I think no one really needs the dates within the wheel; the version number should be enough for that. If the timestamps of the files aren't of much use, then we can accomplish the underlying goal with even less code, which is a nice feat.
I'm also still happy to build the git version-based version of this, so I'll intend to do that in the coming weeks unless I hear otherwise, and ping you when I have something for you to look at.
Cheers!
Maybe. In principle, the filesystem timestamps still provide a kind of crude record if someone manages the master copy of the code on a single computer, with no version control. But that's obviously very limited, and a pain to make sense of. And it's common these days to put even small projects in git.
Let's build timestamps-from-git first and see how that goes for a while before changing what it does with no recognised VCS.
+1
Just a note that if you use zero timestamps, you will need to use strict_timestamps=False with zipfile to avoid errors. It might not be a bad idea to do that regardless.
Yup, I actually had to fix that recently for timestamps from SOURCE_DATE_EPOCH (#448).
Ah, interesting. I found that the ZipInfo.from_file call still fails when the file has a pre-1980 mtime. (Long story involving a not-quite-pax compatible unarchiver and the tarfile module.)
Good point. I added the ZipInfo.from_file method while working on Flit, and I only found out about the 1980 limitation much more recently. So it makes sense that from_file doesn't work around that. :slightly_smiling_face:
I ended up doing this to avoid the problem.
--- flit_core/wheel.py.orig 2021-11-15 01:34:30.000000000 +1100
+++ flit_core/wheel.py 2021-11-21 17:00:14.000000000 +1100
@@ -7,6 +7,7 @@
import os
import os.path as osp
import stat
+import sys
import tempfile
from types import SimpleNamespace
from typing import Optional
@@ -102,7 +103,10 @@
rel_path = rel_path.replace(os.sep, '/')
if self.source_time_stamp is None:
- zinfo = zipfile.ZipInfo.from_file(full_path, rel_path)
+ if sys.version_info[:2] >= (3, 8):
+ zinfo = zipfile.ZipInfo.from_file(full_path, rel_path, strict_timestamps=False)
+ else:
+ zinfo = zipfile.ZipInfo.from_file(full_path, rel_path)
else:
# Set timestamps in zipfile for reproducible build
zinfo = zipfile.ZipInfo(rel_path, self.source_time_stamp)