temurin-build icon indicating copy to clipboard operation
temurin-build copied to clipboard

`build.getDependency` is not using versioned artifacts

Open sxa opened this issue 1 year ago • 12 comments

https://ci.adoptium.net/view/all/job/build.getDependency squashes the version number in the downloaded artifacts, making it impossible to recreate a download from there when running a reproducible build test. For example, the CycloneDX update in https://github.com/adoptium/temurin-build/pull/3558 meant that there was a new artifact with the same name and different SHA which means that you can't easily re-run with an old version of the build scripts and expect the SHA checks to pass (because we'll always pull the latest version, which may have changed) which impacts any ability for a customer to fully run reproducibility test of our last GA with an SBoM, and therefore breaks our great story around reproducibility.

Related: https://github.com/adoptium/ci-jenkins-pipelines/issues/863 Most recent slack thread: https://adoptium.slack.com/archives/C09NW3L2J/p1703013340753489?thread_ts=1703010237.478249&cid=C09NW3L2J

sxa avatar Dec 20 '23 16:12 sxa

The artifacts maybe need a SHA256.txt, possibly GPG .sig Maybe filename needs a version in it?

andrew-m-leonard avatar Mar 13 '24 11:03 andrew-m-leonard

Tasks:

  • [x] Add code to generate a sha256 file for each jar.
  • [x] Test that.
  • [x] Add code to generate a version file for each jar.
  • [x] Test that.
  • [x] Add code to the build.xml file so we're downloading the sha rather than using duplicate hard-coded values.
  • [x] Test that.
  • [x] Add code to add the version and sha for each jar to the sbom
  • [x] Test that.

adamfarley avatar Mar 14 '24 11:03 adamfarley

Looking at the above list I have a couple of questions ...

  1. Why can't we retain the version number from the download? Deliberately removing it then holding a separate version file seems oddly complex.
  2. How are the SHAs being generated? If we're just creating them after downloading then that doesn't protect us against anything other than the transfer between jenkins and the build machines. Hard coded values in the build scripts seem like a much better idea to me unless I've misunderstood what is being attempted here.

sxa avatar Mar 14 '24 13:03 sxa

1. Why can't we retain the version number from the download? Deliberately removing it then holding a separate version file seems oddly complex.

Because it seemed simpler to me than attempting to parse the file name, especially since the getDependencies script already holds the versions separate.

And I think the version strings need to be separate because other version strings in the SBOM are already separate, e.g.:

    "tools" : [
      {
        "name" : "GLIBC",
        "version" : "2.17"
      },
2. How are the SHAs being generated? If we're just creating them after downloading then that doesn't protect us against anything other than the transfer between jenkins and the build machines. Hard coded values in the build scripts seem like a much better idea to me unless I've misunderstood what is being attempted here.

The SHAs are hard-coded in the getDependencies script, and are used to determine whether the download is intact. These values are hard-coded, for security.

We will be generating signatures as part of getDependencies, to remove the risk of a man-in-the-middle attack after the getDependencies download, when the jars are downloaded from Jenkins during SBOM creation preparation.

We are doing this to ensure security while, at the same time, avoiding having to c+p the SHAs in three different places (getDependencies, download during build, and SHA documentation in the SBOM).

Another option is to have the SHAs in a shared location in the source repo, which has its own risks. Every time the SHAs are copied, we are exposed to a MITM risk, which is at least mitigated if the first time we download the files we generate secure Adoptium sig files.

adamfarley avatar Mar 14 '24 14:03 adamfarley

Because it seemed simpler to me than attempting to parse the file name, especially since the getDependencies script already holds the versions separate.

I still feel that renaming the file is more likely to cause confusion but I won't block based on it, however I do think we've got plenty of precedent for parsing and obtaining version numbers from locally on the machine (especially with the strace output) and I'd personally feel more comfortable with pulling it out on the live system if we can.

Another option is to have the SHAs in a shared location in the source repo, which has its own risks. Every time the SHAs are copied, we are exposed to a MITM risk, which is at least mitigated if the first time we download the files we generate secure Adoptium sig files.

I'm not sure that the extra complexity of invoking GPG signing here (which I assume is what the "adoptium sig files" refers to) is preferable to just holding those SHAs in the build scripts as well as in this job. IMHO Ideally a consumer of our scripts should be able to use our processes pulling directly from the upstream resources instead of having to rely on our jenkins CI, and this will make it harder for them to point at the upstream URL if desired as that won't have the signatures that we'd be checking against.

sxa avatar Mar 14 '24 14:03 sxa

Also do we have a solution for being able to pull an old version when required (for example when doing a reproducible build of an older release which may need an older version of one of the SBoMs to produce comparable output)?

sxa avatar Mar 14 '24 14:03 sxa

Also do we have a solution for being able to pull an old version when required (for example when doing a reproducible build of an older release which may need an older version of one of the SBoMs to produce comparable output)?

Not that I'm aware of, no. I think that should be a different issue.

adamfarley avatar Mar 14 '24 14:03 adamfarley

I'd personally feel more comfortable with pulling it out on the live system if we can.

Either one works for me. Not fussed about adding parsing in. Will add that.

Another option is to have the SHAs in a shared location in the source repo, which has its own risks. Every time the SHAs are copied, we are exposed to a MITM risk, which is at least mitigated if the first time we download the files we generate secure Adoptium sig files.

...which I assume is what the "adoptium sig files" refers to

Yup.

...is preferable to just holding those SHAs in the build scripts as well as in this job.

Either way is fine.

...as that won't have the signatures that we'd be checking against.

Fair. Will centralise the SHAs in the build repo then.

adamfarley avatar Mar 14 '24 14:03 adamfarley

Also do we have a solution for being able to pull an old version when required (for example when doing a reproducible build of an older release which may need an older version of one of the SBoMs to produce comparable output)?

Not that I'm aware of, no. I think that should be a different issue.

OK - feel free to split it out if desired, although that was part of the intended scope of this one as per the example in the description of this issue ;-)

Thanks for taking on the other tweaks.

sxa avatar Mar 14 '24 14:03 sxa

Update: Currently testing the set of code changes relating to sbom generation, (documentation updates pending).

We now keep the cyclonedx dependency SHAs and version strings in a single location, making it easy for users to set their own SHAs and versions. These version strings will be included in the sbom automatically.

Users will also be able to download dependencies from their chosen source by modifying "sbom_dependency_default_location" in the cyclonedx-lib/build.xml file.

The getDependencies groovy script will also be improved to allow users to set their own preferred location for dependency storage. - done

Monday (2024-03-18) update: The improvements to the ant build.xml file that fetches the jars has been fixed. I've also removed a typo in the build.sh file section that gathers the version strings and stores them in the sbom. Testing again.

Ok, that passed. Added documentation and an exclusion for the sbom dependency that we generate at runtime, as we don't have a version string for that. Final test run.

adamfarley avatar Mar 15 '24 15:03 adamfarley

TLDR:

The first step here is to centralise the SBOM dependency SHAs and version numbers in specific files. PR here.

This makes it easy for users to specify new versions and SHAs.

The second step (pending) will be to put the upstream location (with version wildcards) in similar "specific files", and to give both the build.xml and build.getDependencies the ability to use them (in the former case: only when the version file doesn't match the one in adoptium/temurin-build).

User POV: To change a dependency version in my build, I simply need to change the version number in "temurin-build/cyclonedx-lib/dependency_data/versions".

@sxa - What do you think? ~~Will version files be enough, or do I need to add the ability to set the version via a script argument?~~

Update: Will add a script argument. Step 2 will be actioned after I'm done with https://github.com/temurin-compliance/temurin-compliance/issues/474

adamfarley avatar Mar 18 '24 13:03 adamfarley

Note: The fix for the bugged dependency SHAs in the sbom has been separated out into a new PR for the sake of the March 2024 release (expedited review).

Master branch PR: https://github.com/adoptium/temurin-build/pull/3713 Release branch PR: https://github.com/adoptium/temurin-build/pull/3714

adamfarley avatar Mar 19 '24 14:03 adamfarley

As the sbom currently contains a link to the exact version of the temurin-build source code that generated a build, I don't think we need to specify the versions of the sbom dependencies if we're trying to reproduce a build (as the temurin-build repo already has that information).

This can be reopened if anyone thinks of another reason the sbom creation dependencies could need to be specified via command-line argument (as opposed to the source files).

adamfarley avatar Jul 03 '24 10:07 adamfarley