rpm RFE: read sources checksums from the SPEC file and verify them

In SPEC files there is the Source0/SourceX directive to specify the sources. Example:

Source0: http://releases.nixos.org/patchelf/patchelf-%{version}//%{name}-%{version}.tar.bz2

cf. https://github.com/gsauthof/copr-epel/blob/master/patchelf/patchelf.spec#L9

Other software then uses those keys to download the referenced files. For example, one can tell rpmbuild to auto-download all sources, Copr auto-downloads the sources (by default) or the user evaluates the SPEC file with rpmspec -P and copy'n'pastes into a curl/wget command line.

Either way, as-is, there is no way to specify cryptographic checksum inside the SPEC file for the sources specified there. For example like this:

Source0-sha256: a0f65c1ba148890e9f2f7823f4bedf7ecad5417772f64f994004f59a39014f83

With such syntax available it would be great if rpm would verify each source against the specified checksum and fail if there is a mismatch.

cf. https://bugzilla.redhat.com/show_bug.cgi?id=1536846#c7 for an example where this came up

As-is, it seems, one has to maintain checksums outside the .spec file, be cautious to manually check checksums or just ignore the problem - which is all tedious and error-prone or dangerous.

For example, because of this I had to roll my own verification logic for my Copr packages (as described in https://bugzilla.redhat.com/show_bug.cgi?id=1536846#c7).

Thus, adding this capability directly to rpm would reliably protect against man-in-the-middle (MITM) attacks.

Note that restricting auto-downloads to secure protocols like https isn't a substitute for this feature as a) not all sources are available via https and b) the infrastructure behind the https endpoint can have security issues (e.g. a temporary intrusion where files are getting replaced).

See also the discussion around https://pagure.io/copr/copr/pull-request/211#comment-43965 how Copr disabled insecure protocols for auto-downloads.

Other package formats support specifying checksums for the referenced source files (e.g. Arch/Gentoo/Homebrew source packages).

Jul 13 '18 16:07 gsauthof

I'm perfectly fine with the rpm5 syntax and probably with most possible alternatives - as long as they are usable.

The rpm5 description I could find in the changelog mentions:

digest([<algorithm>:]/path) = hex

I verify checksums all the time. The vast majority of open-source upstream also cares as you can see how much effort is put into publishing checksums for releases (e.g. published via independent channels, checksum files/release notes with checksums signed with a GPG key that is trusted by well-known project members etc.). I only have to deal with software with missing checksums when I have to get software from clueless corporations like Oracle (e.g. the download is only provided over http and often there is no checksum available at all, or just a md5 one or even just a CRC one).

I would argue that updating a checksum to the SPEC file is less tedious as the status-quo where you have to implement some outside checksum verification process/scripting. Also, with a new upstream release you already have to edit the spec file to adjust the version information.

Regarding 'SRPMs in rpm' - the SRPM isn't always the starting point. For example, for the quite popular Copr service it is not.

I can't see how adding checksums to a source downloads cache in a service like Copr would improve security. Because then the file is just downloaded (and checksummed) on the first request. And the source file could be very well comprised on that first download. Depending on the cache size and rebuild-frequency there are no more re-downloads such that such a compromise wouldn't be detected.

Jul 14 '18 10:07 gsauthof

Would be really useful to have (of course should be optional to keep old spec files working).

Mar 20 '20 13:03 berolinux

In Fedora, we're considering having sources auto-fetched and uploaded to the lookaside cache as part of accepting PRs for package updates, but to make this straightforward, we'd want this to be in the spec file so that we know we're downloading what we're supposed to.

Something along the lines of SourceN(<checksum-type>): <checksum-hash> (and similar for PatchN, of course) was what I'm thinking of.

There's some prior art here in other package management solutions that influenced this idea. For example, Arch and Alpine both let you specify a source URL and specify checksums to validate this, as does Solus' ypkg YAML format.

For example, the Arch PKGBUILD for rpm-tools has the following snippet:

source=(http://ftp.rpm.org/releases/rpm-$_base_pkgver/rpm-$pkgver.tar.bz2
	rpmextract.sh
        rpmlib-filesystem-check.patch)
sha256sums=('ddef45f9601cd12042edfc9b6e37efcca32814e1e0f4bb8682d08144a3e2d230'
            '3e5bf450d4628366ba35469ec0530a99cd09ab2616a3d261a3f68270f481f777'
            'bd0e6dbd458f990268c60324190c6825b234647ecdde08296d2b453dc4bce27a')

And Solus' ypkg YAML for rpm does something similar:

source     :
- https://github.com/rpm-software-management/rpm/archive/rpm-4.14.2.1-release.tar.gz : 92cab9da7524cf4e4abf33f160cdfb9e63fe021bc7133f97735f70cdec777400

Mar 20 '20 13:03 Conan-Kudo

I know I had my own #570 where I already proposed 3 possibilities:

Source0: ftp://ftp.example.com/pub/foo/%{name}-%{version}.tar.gz
Checksum0: SHA512  = 2c8211ae5f1578502dc9b29babe7d03ec61f500b3c2dd309be2bbd34fd194abba29d95812e7dab4bfacda13e342323921663464bab4cbf4af0a198e8437233f4

Source0: SHA512 (ftp://ftp.example.com/pub/foo/%{name}-%{version}.tar.gz) = 31bacf58469953282cd5d8b51862dcf4b84dedb927c1871bc3fca32fc157fe49187631575a70838705fe246f4555647577a7ecc26894445a7d64de5503dc11b4

Provides: checksum(SHA512 (%{name}-%{version}.tar.gz) = 31bacf58469953282cd5d8b51862dcf4b

but now thinking about this again, what is the problem with the sources file in dist-git? It already contains checksums.

Mar 20 '20 13:03 voxik

but now thinking about this again, what is the problem with the sources file in dist-git? It already contains checksums.

We cannot rely on this file if we want rpm to be able to auto-download sources with any degree of confidence.

Per the comment in the macros.in file:

#
# Should rpm try to download missing sources at build-time?
# Enabling this is dangerous as long as rpm has no means to validate
# the integrity of the download with a digest or signature.
%_disable_source_fetch 1

This was the rationale for my filing #1126...

Mar 20 '20 14:03 Conan-Kudo

We cannot rely on this file if we want rpm to be able to auto-download sources with any degree of confidence.

Why? Technically there shouldn't be a problem.

Mar 20 '20 14:03 praiskup

@praiskup I think what @Conan-Kudo meant is (correct me if I'm wrong), that if we have the hashes only in sources and not in the spec file, a plain local rpmbuild can't (safely) download missing sources.

Mar 20 '20 15:03 nphilipp

@praiskup I think what @Conan-Kudo meant is (correct me if I'm wrong), that if we have the hashes only in sources and not in the spec file, a plain local rpmbuild can't (safely) download missing sources.

This is correct.

Mar 20 '20 15:03 Conan-Kudo

Currently, fedpkg downloading and uploading functionality is independent of rpm which brings an advantage that we can use it to upload/download custom binary stuff for other "package" types (e.g. containers).

So placing the source hashes into spec file itself would limit the lookaside-cache functionality to rpms only.

There is another factor that currently when I call fedpkg sources I know I am safe because it doesn't parse spec file.

If we put the hashes into spec file, it would mean fedpkg sources would suddenly become a different beast with different security requirements (which should be considered when doing this automatically on production systems).

Mar 20 '20 16:03 clime

For me, causal user of OS sometimes building small packages, current way of building rpms is very confusing. There are various tools which duplicate functionality: rpmbuild, mock, rpkg, fedpkg etc. It's much harder to understand than PKGBUILDs on Arch Linux for example. For me having ability to declare checksums in rpmspec in standardized way would make UX much better.

This is the reason of huge success of AUR and PKGBUILDs in Arch Linux world — simplicity.

Mar 20 '20 16:03 TomaszGasior

For me, causal user of OS sometimes building small packages, current way of building rpms is very confusing. There are various tools which duplicate functionality: rpmbuild, mock, rpkg, fedpkg etc. It's much harder to understand than PKGBUILDs on Arch Linux for example. For me having ability to declare checksums in rpmspec in standardized way would make UX much better.

I am not sure what it would actually bring. It is good that you can make %_disable_source_fetch 0 trustworthy but one could also do e.g.

Source0: https://github.com/release-engineering/dist-git/archive/dist-git-1.12-1.tar.gz#sha256

with the current rpm syntax and then have a macro that verifies the tarball checksum based on the #sha256. Is it too hacky? :)

It might not even be a hack because https://tools.ietf.org/html/rfc2396#section-4.1 - the fragment identifier is in the realm of client.

So imho, new rpm syntax is not even needed.

I think we could extend rpm macros to support this and it would be nice e.g. for COPR.

It's a pity that url fragment already seems to be used for something: https://fedoraproject.org/wiki/Packaging:SourceURL#Troublesome_URLs

but that maybe it doesn't necessarily mean we couldn't extend its use...? Btw. there is a note:

"Sometimes this does not work because the upstream cgi tries to parse the fragment "

that would be a server-side error according to the RFC.

But for Fedora this has still a limited use because we would like to be independent of any of github/gitlab/bitbucket stuff.

We could use that information (the hash) to automatically fetch the sources from upstream the first time when they are not yet present in our (Fedora) lookaside cache and then further rely on the sources file, the checksum there and the lookaside cache but I think this is a problematic solution. It would mean build system would need to commit into dist-git, which would mess up automatic versioning (if there is any). I haven't thought about this too much, there might be other issues.

I think the fedpkg upload way is quite fine and it can be automated too (outside of Fedora infra) if one trusts 100% to the tarball source/provider. It's hard to tell if such automation should be recommended.

This is the reason of huge success of AUR and PKGBUILDs in Arch Linux world — simplicity.

Mar 20 '20 21:03 clime

...Actually the fragment part shouldn't even be sent to the server so that note should probably be fixed...

Mar 20 '20 21:03 clime

One problem with the sources file is that it is distro specific -- Fedora uses sources, OpenMandriva uses a similar file (though with slightly different syntax) called .abf.yml, probably other distributions have yet other workarounds.

Another problem is that it's not the spec file -- I don't think we want to end up with a mess similar to what dpkg has in those debian directories.

Mar 23 '20 16:03 berolinux

One problem with the sources file is that it is distro specific -- Fedora uses sources, OpenMandriva uses a similar file (though with slightly different syntax) called .abf.yml, probably other distributions have yet other workarounds.

Another problem is that it's not the spec file -- I don't think we want to end up with a mess similar to what dpkg has in those debian directories.

I think it is distro-specific because it is tight down to particular distribution-git that some higher-level tools work with (fedpkg, centpkg, rpkg, ...), i.e. where the packages are stored.

But the SourceX: in rpm spec file typically does not reference a tarball in that particular dist-git but it instead references sources from upstream (at least in Fedora/CentOS and for OpenMandriva that seems to be true as well). But that's not from where the tarballs are downloaded when they are going to be built.

Mar 23 '20 17:03 clime

Anyway, what about something like %_verify_fetched_source_checksums macro with values 0/1/2 where

0: do not check source checksums even if present
1: check source checksums if present as `#(<hashtype>)<checksum>` url suffix by invoking `<hashtype>sum` command from coreutils to do the check
2: check source checksums and return false when some fetched source does not have a checksum attached

I imagine the verification would only apply to files that were fetched by rpmbuild. Those that were already present before the build started wouldn't be checked with assumption that user gave to rpmbuild valid sources. What do you think?

Mar 23 '20 19:03 clime

Thinking further about this, do we actually need something really fancy as special tag?

For example, it is quite easy to check if the checksums in the dist-git sources file are correct during rpmbuild -bs. It is enough to put %(sha512sum -c sources) somewhere into specfile preamble. If the checksums are not correct then the rpmbuild -bs fails.

Looking at Verifying Signatures Fedora guidelines, I actually wonder, why the check is done in %prep. Maybe something like this macro should be placed right bellow the sources in preamble.

Mar 24 '20 08:03 voxik

I actually wonder, why the check is done in %prep.

This might be due to BR: gnupg2, have not investigated this.

Mar 24 '20 08:03 voxik

@voxik That is for verifying GPG signatures if they exist, and yes, it requires gnupg2 as a build dependency in that scenario.

Mar 24 '20 10:03 Conan-Kudo

@voxik That is for verifying GPG signatures if they exist, and yes, it requires gnupg2 as a build dependency in that scenario.

But if that was macro included in RPM, RPM would probably grow the dependency, so it would not need BR. So I still think this could be way forward.

Mar 24 '20 12:03 voxik

Hello @voxik, sha256sum etc. are in coreutils, which I bet rpm already requires...i mean coreutils should be present on any system anyway.

An interesting idea with %(sha512sum -c sources) but I wouldn't bring the sources file into the picture because it is used to fetch files from dist-git before rpmbuild even happens and checksums are checked at that stage. All urls that are now pointing to upstream would need to change to point to dist-git lookaside cache if the rpm mechanism for downloading should be used instead of the fedpkg one.

We could use a bit of bash code %([ "$(sha256sum <path_to_source_filename> | cut -d " " -f 1)" = <checksum> ]) to do the verification per downloaded source but i think <path_to_source_filename> might be slightly tricky unless rpm exposes enviroment variable like 'SOURCES'. Also maybe it would be more pleasant to have the support for this in rpm than to put those snippets into spec.

Mar 24 '20 12:03 clime

RPM does not specifically require GNU coreutils. And sha256sum and friends are not specifically available on all platforms RPM is used on.

Mar 24 '20 12:03 Conan-Kudo

Hmm, on Fedora:

$ dnf repoquery --requires rpm
Last metadata expiration check: 0:00:08 ago on Tue 24 Mar 2020 01:34:47 PM CET.
/usr/bin/bash
/usr/bin/db_stat
/usr/bin/sh
coreutils
curl
libarchive.so.13()(64bit)
libc.so.6(GLIBC_2.4)(64bit)
libpopt.so.0()(64bit)
libpopt.so.0(LIBPOPT_0)(64bit)
libpthread.so.0()(64bit)
libpthread.so.0(GLIBC_2.2.5)(64bit)
librpm.so.9()(64bit)
librpmio.so.9()(64bit)
popt(x86-64) >= 1.10.2.1
rtld(GNU_HASH)

I can see coreutils there. So I guess you mean on other distributions...if you can name those and where sha512sum and similar checksum checking utils can be found, that would be great!

Generally speaking....Is adding dependency to rpm for checksum checking utilities a big deal? It seems to me that it is something that rpm could support.

Mar 24 '20 12:03 clime

On macOS, there is not a consistent interface for doing checksums via CLI. I'm unsure if AIX and other platforms would also have similar problems.

Mar 24 '20 12:03 Conan-Kudo

We could use a bit of bash code %([ "$(sha256sum <path_to_source_filename> | cut -d " " -f 1)" = <checksum> ]) to do the verification per downloaded source

Yep, something like this is where I was heading to.

but i think <path_to_source_filename> might be slightly tricky unless rpm exposes enviroment variable like 'SOURCES'

RPM exposes %{SOURCE1} right after it appears in the specfile.

All urls that are now pointing to upstream would need to change to point to dist-git lookaside cache if the rpm mechanism for downloading should be used instead of the fedpkg one.

No, the %{SOURCE1} points to the rpmbuild directory, not to the URL. I don't think that fetching sources by RPM is really in question here, but it could be used.

Mar 24 '20 12:03 voxik

We could use a bit of bash code %([ "$(sha256sum <path_to_source_filename> | cut -d " " -f 1)" = <checksum> ]) to do the verification per downloaded source

Yep, something like this is where I was heading to.

but i think <path_to_source_filename> might be slightly tricky unless rpm exposes enviroment variable like 'SOURCES'

RPM exposes %{SOURCE1} right after it appears in the specfile.

All urls that are now pointing to upstream would need to change to point to dist-git lookaside cache if the rpm mechanism for downloading should be used instead of the fedpkg one.

No, the %{SOURCE1} points to the rpmbuild directory, not to the URL. I don't think that fetching sources by RPM is really in question here, but it could be used.

Alright but I think we are maybe tracking different goals then :). I thought the issue was about making %_disable_source_fetch 0 reliable. It's what the original post suggested to me.

Mar 24 '20 13:03 clime

Alright but I think we are maybe tracking different goals then :). I thought the issue was about making %_disable_source_fetch 0 reliable. It's what the original post suggested to me.

It is about that. We don't want to operate on sources unless they've been validated to be good. That means no doing %prep, %generate_buildrequires, etc. until validated if rpm is downloading the sources.

Mar 24 '20 13:03 Conan-Kudo

Cool, ok, I admit I am not sure how big deal is the macOS compatibility. I think if somebody complains that the feature doesn't work there, it would be a reason to instead use some c library for the checksum validation. Maybe that could be done immediately? I would still consider relying on coreutils a good start.

Mar 24 '20 13:03 clime

But, yeah, maybe you guys have some other/better solution in mind. Idk.

Mar 24 '20 13:03 clime

In reply to https://github.com/rpm-software-management/rpm/issues/463#issuecomment-601922444:

Source0: https://github.com/release-engineering/dist-git/archive/dist-git-1.12-1.tar.gz#sha256

with the current rpm syntax and then have a macro that verifies the tarball checksum based on the #sha256. Is it too hacky? :)

Yes, this looks ultra hacky to me.

I can see a lot of potential for confusion with this #hashsuffix syntax.

You can't expect that a package contributor would have that part of that RFC present which is necessary to understand the mechanism. And even then, it's far from obvious or similar to any other existing mechanism for declaring checksums of dowloadable files.

Mar 29 '20 17:03 gsauthof

Well, on the other hand, it is a minimalistic solution that doesn't introduce any new rpm constructs. Also, you don't need to know or read the RFC. You just need to read the rpm documentation where it is described that source checksums are attached to source URLs as #(<checksum-type>)<checksum> suffixes. With an example, I think, people would get it immediately.

Mar 29 '20 18:03 clime

rpm rpm copied to clipboard

RFE: read sources checksums from the SPEC file and verify them

rpm
rpm copied to clipboard