live-bootstrap icon indicating copy to clipboard operation
live-bootstrap copied to clipboard

RFC: Using VCS snapshots

Open fosslinux opened this issue 3 years ago • 8 comments

Despite the best of our ability, sometimes pregenerated files manage to slip through the cracks, such as with coreutils 5.0 today, there's been a couple other cases too.

Perhaps it would be a better idea to source all of our distfiles from VCS (normally git) snapshots?

Git snapshots are less likely to have pregenerated files generally speaking. They will obviously still have some in some cases, but there is less margin for manual auditing error not finding pregenerated files.

Thoughts?

fosslinux avatar May 29 '22 00:05 fosslinux

I'd agree. You can also use a script similar to https://github.com/schierlm/FullSourceBootstrapFromGit/blob/main/check-swh.sh to check that these git/svn repos are indeed archived by softwareheritage.org. So even if the git repo goes away, there will be a secondary source. (NB the script contains a few bashisms like associative arrays, and it would probably be easier to rewrite in Python than remove those bashisms).

Another option would be to generally consult a diff against VCS when manually auditing new packages.

Yet another option would be to generate Makefiles and check if there are any makefile targets for files that already exist. Or check what is deleted by targets like make maintainerclean (e.g. by invoking them and diffing the result).

schierlm avatar May 29 '22 07:05 schierlm

I don't have strong opinion. If we decide to go with git snapshots, I guess it's fine. Although, the main pregen file offenders are usually GNU tarballs, rest are usually fairly good (mostly need just autoreconf -fi.

With git snapshots there is a risk of non-content changes, e.g. after remote server upgrade, git snaphots might be generated with newer gzip/tar and potentially have different checksum (not sure if that happens in practice).

But at the very least, it's probably a good idea to run every build step via using git snapshot at least once (manually if we don't switch in the end). That way we should catch most of the remaining pregen issues.

stikonas avatar May 29 '22 21:05 stikonas

As long as git snapshots are obtained through git, sounds fine. In general using systems like GitHub to generate downloadable archives is really fragile as we have noticed they want to preserve right to change git archive generation algorithms which can result in checksum changes without much of a warning.

nanonyme avatar Feb 28 '23 10:02 nanonyme

There's also the concern of whether it will still be possible to handle the "this needs to be downloaded without HTTPS" cases if there's switch to VCS snapshots.

nanonyme avatar Oct 14 '23 19:10 nanonyme