temurin-build icon indicating copy to clipboard operation
temurin-build copied to clipboard

EPIC: Extend SBOM "formulation" to allow correct recipe for re-making...

Open andrew-m-leonard opened this issue 1 year ago • 5 comments

The intention of the CycloneDX "formulation" is to provide a "recipe" for "re-making" the exact same build. As it currently stands the SBOM formulation section contains strace analysis listing of packages & tooling dependencies used in the original build. We need to add a new section for a "recipe" that provides the exact "configure & make" commands along with how to create a "compatible" environment to re-build an identical build.

Tasks:

  • [ ] 1. Investigate CycloneDX formulation spec and design how the Temurin OpenJDK build tasks (eg.setup env, clone, configure, make,...) could be described to enable "reproduction" of a build
  • [ ] 2. Design how the necessary formulation information would be obtained during a temurin build from the build scripts: build.sh, prepareWorkspace.sh, ...
  • [ ] 3. "hand construct" an "example" sbom.json with the expected "formulation" definitions as determined from tasks 1 & 2. Then validate it using the CycloneDX-cli validation tool
  • [ ] 4. Design spec for necessary changes to TemurinGenSBOM.java app to support adding formulation tasks. Eg.what sensible "operations" make sense to add "formulation" sections to a SBOM?
  • [ ] 5. Update TemurinGenSBOM.java to support "formulation" generation, including unit tests in the ant make file: build.xml
  • [ ] 6. Update temurin-build scripts to generate SBOM formulations

andrew-m-leonard avatar Apr 04 '24 08:04 andrew-m-leonard

would like to work on this issue

angie-chang0 avatar Jun 02 '25 02:06 angie-chang0

would like to work on this issue

Thank you @angie-chang0 The above "Tasks" list the main steps to complete this issue, but they assume certain background knowledge on how to build Temurin, so I would recommend to start with trying to build Eclipse Temurin in your local environment (eg.your laptop/PC). There are a few ways to do that, but probably the easiest is to use an Adoptium docker build containter, you can follow the instructions here: https://github.com/adoptium/temurin-build/wiki/Building-OpenJDK-using-temurin-build-scripts-within-the-adopt-build-docker-container

If you don't want to use "docker", you can skip the docker steps 1-3, and just try and build Temurin using steps 4-7, however you will need to install numerous tooling and dependencies... Which you can do and when the temurin-build script fails it typically indicates what tooling is missing...

Feel free to leave comments/questions in this issue and I can assist thank you

andrew-m-leonard avatar Jun 02 '25 13:06 andrew-m-leonard

@angie-chang0 as a "first PR" so we can test your github access, I recommend putting some initial thoughts to task (1), and then maybe summarizing in a short paragraph maybe a [TBD] statement of what we will be adding to the CycloneDX temurin-build support, in the README.md here: https://github.com/adoptium/temurin-build/tree/master/cyclonedx-lib/README.md eg. maybe [TBD] New --formulation to be added to TemurinGenSBOM to .......etc

andrew-m-leonard avatar Jun 02 '25 13:06 andrew-m-leonard

I would like to work on this as well, I'll get started with the steps Andrew listed above

noelmamo avatar Jun 02 '25 15:06 noelmamo

We need to think about what a Temurin "formulation" might look like, and it maybe worth doing some research as to what other "projects" or SBOM examples you can find might do..?

My high level thoughts, are is the formulation a description of the steps from cloning "temurin-build", to running the make-jdk-any-platform.sh, with the appropriate arguments...? OR is it lower level than that, eg.clone openjdk source, run the autoconf "configure" step, run the "make images" etc... ie.the commands the temurin-build scripts invoke... I am thinking the later, but i'm not sure... so some thought and research is probably required as to what is the sort of "standard" way... for a "formulation"...?

andrew-m-leonard avatar Jun 16 '25 13:06 andrew-m-leonard

@noelmamo @angie-chang0 One way we could look at this is:

The essence of how OpenJDK is "built" into a JDK Binary, involves a combination of Autoconf and GNU Make, the "process" consists of three principal steps:

  1. "Clone"
  2. "Configure"
  3. "Make"

In real terms, I can demonstrate with an example I ran on my Mac earlier:

  1. "Clone" :
git clone [email protected]:adoptium/jdk21u
  1. "Configure":
bash ./configure --with-vendor-name="Eclipse Adoptium" --with-vendor-url=https://adoptium.net/ --with-vendor-bug-url=https://github.com/adoptium/adoptium-support/issues --with-vendor-vm-bug-url=https://github.com/adoptium/adoptium-support/issues --with-version-opt=202507231006 --with-version-pre=beta --with-version-build=32 --with-vendor-version-string=Temurin-25+32-202507231006 --with-boot-jdk=/Users/anleonar/workspace/temurin-build/jdk24/Contents/Home --enable-linkable-runtime --with-debug-level=release --with-native-debug-symbols=external --with-alsa=/Users/anleonar/workspace/temurin-build/workspace/./build//installedalsa --with-source-date=1753265212 --with-hotspot-build-time='2025-07-23 10:06:52' --disable-ccache --with-build-user=admin --with-jvm-variants=server --with-cacerts-src=/Users/anleonar/workspace/temurin-build/sbin/../security/certs  --disable-warnings-as-errors --with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/  --with-freetype=bundled --with-zlib=bundled
  1. "Make":
make product-images

If someone using a Mac was to "clone" the exact same OpenJDK source code, and Tooling (Xcode compiler, BootJDK, ALSA) as used in the above, and run those two above commands, it should in theory produce an identical "reproducible" JDK image binary.

So the "formulation", could, at least as a first pass be a suitable, "clone", "configure" command, followed by a "make" command. The formulation could start with a "tooling check" maybe, for example: clang --version == "15.0.0" ? Some parts are a little tricky, for example ALSA for mac and Windows is compiled from source during the build, but we could refine that later...

andrew-m-leonard avatar Jul 23 '25 15:07 andrew-m-leonard

@noelmamo @angie-chang0 An alternate way is as directives to running temurin-build scripts directly, which is probably simpler:

eg.

git clone [email protected]:adoptium/temurin-build
cd temurin-build
bash ./makejdk-any-platform.sh jdk21u

but with the necessary setup and parameters to build the same identical build as defined in the SBOM.

This is probably the simpler way that we should proceed with...

andrew-m-leonard avatar Jul 23 '25 15:07 andrew-m-leonard

Below is the structure I am following to implement a formulation for Temurin. Each formulation entry will will contain:

  • A top level 'formula' node, such as "temurin-jdk17u-arm64"
  • One or more 'components', each representing an input or tool:
    • Type: "script", "tool", "source", "platform", etc
    • Properties: metadata such as path, invokedBy, url, commit, arch, image, etc

This follows the CycloneDX 1.6 formulation.


  1. Design formulation naming strategy

    • Example: "temurin-jdk17u-arm64" to reflect build target
    • May expand to include build date, CI job ID, or other relevant data
  2. Define key component categories

    • Build scripts (such as make-adopt-build-farm.sh)
    • Source repositories (temurin-build.git)
    • Build tools (including autoconf, make)
    • Platform/environment (Docker image or host OS)
  3. Define a minimal set of properties per component

    • For a script: path, invokedBy
    • For a tool: version, type
    • For the environment: arch, os, image
  4. Ensure support for both Docker and non-Docker builds

    • Capture docker image or host OS conditionally

I have been working on adding a working implementation of this into TemurinGenSBOM.java and will submit a draft PR of what it ends up as hopefully by the end of today.

noelmamo avatar Jul 28 '25 13:07 noelmamo

Below is the structure I am following to implement a formulation for Temurin. Each formulation entry will will contain:

* A top level 'formula' node, such as `"temurin-jdk17u-arm64"`

* One or more 'components', each representing an input or tool:
  
  * Type: `"script"`, `"tool"`, `"source"`, `"platform"`, etc
  * Properties: metadata such as `path`, `invokedBy`, `url`, `commit`, `arch`, `image`, etc

This follows the CycloneDX 1.6 formulation.

1. Design formulation naming strategy
   
   * Example: `"temurin-jdk17u-arm64"` to reflect build target
   * May expand to include build date, CI job ID, or other relevant data

2. Define key component categories
   
   * Build scripts (such as `make-adopt-build-farm.sh`)
   * Source repositories (`temurin-build.git`)
   * Build tools (including `autoconf`, `make`)
   * Platform/environment (Docker image or host OS)

3. Define a minimal set of properties per component
   
   * For a script: `path`, `invokedBy`
   * For a tool: `version`, `type`
   * For the environment: `arch`, `os`, `image`

4. Ensure support for both Docker and non-Docker builds
   
   * Capture docker image or host OS conditionally

I have been working on adding a working implementation of this into TemurinGenSBOM.java and will submit a draft PR of what it ends up as hopefully by the end of today.

Excellent, thank you @noelmamo

andrew-m-leonard avatar Jul 28 '25 15:07 andrew-m-leonard

following the design specified by @noelmamo , and the 1.6 CycloneDX formulation, it would look something like below. This is a work in progress for a PR:

{
  "formulation": {
    "directives": {
      "type": "temurin-build-script",
      "version": "1.0",
      "commands": [
        "git clone [email protected]:adoptium/temurin-build",
        "cd temurin-build", 
        "bash ./makejdk-any-platform.sh jdk21u --with-version-string=21.0.2+13-202312052047 --with-vendor-version-string=202312052047"
      ],
      "environment": {
        "required_tools": ["git", "bash", "make"],
        "platform_specific": {
          "linux": "additional_linux_deps",
          "windows": "additional_windows_deps"
        }
      },
      "parameters": {
        "jdk_version": "21u",
        "build_number": "202312052047",
        "vendor_string": "202312052047"
      }
    }
  }

angie-chang0 avatar Jul 29 '25 05:07 angie-chang0

following the design specified by @noelmamo , and the 1.6 CycloneDX formulation, it would look something like below. This is a work in progress for a PR:

{
  "formulation": {
    "directives": {
      "type": "temurin-build-script",
      "version": "1.0",
      "commands": [
        "git clone [email protected]:adoptium/temurin-build",
        "cd temurin-build", 
        "bash ./makejdk-any-platform.sh jdk21u --with-version-string=21.0.2+13-202312052047 --with-vendor-version-string=202312052047"
      ],
      "environment": {
        "required_tools": ["git", "bash", "make"],
        "platform_specific": {
          "linux": "additional_linux_deps",
          "windows": "additional_windows_deps"
        }
      },
      "parameters": {
        "jdk_version": "21u",
        "build_number": "202312052047",
        "vendor_string": "202312052047"
      }
    }
  }

Looks good, thank you @angie-chang0 The key aspect of this is getting the parameters to makejdk-any-platform.sh correct to 100% identically reproduce the binary. I'm not so worried about detailed environment for a first pass.

andrew-m-leonard avatar Jul 29 '25 08:07 andrew-m-leonard

Hello @andrew-m-leonard @angie-chang0 @noelmamo ,

I've translated the snippet from @angie-chang0 into a CycloneDX 1.6 formulation/workflow entry and validated it with the CycloneDX CLI.

  • No changes to existing SBOM fields.
  • Added a new formula under "formulation" that represents the three commands as a workflow with the required fields.
  • Preserved the original "directives" parameters as properties on the formula.

Note that this is an additional entry appended to to the existing formulation array in the current SBOM (hence the indentation).

    {
      "bom-ref" : "formula_temurin_build_script_1.0_jdk21u",
      "properties" : [
        {"name" : "type", "value" : "temurin-build-script"},
        {"name" : "version", "value" : "1.0"},

        {"name" : "parameter.jdk_version", "value" : "21u"},
        {"name" : "parameter.build_number", "value" : "202312052047"},
        {"name" : "parameter.vendor_string", "value" : "202312052047"},
        
        {"name" : "environment.required_tools", "value" : "git,bash,make"},
        {"name" : "environment.platform_specific.linux", "value" : "additional_linux_deps"},
        {"name" : "environment.platform_specific.windows", "value" : "additional_windows_deps"}
      ],
      "workflows" : [
        {
          "bom-ref" : "workflow_temurin_build_script_1.0_jdk21u",
          "uid" : "f1",
          "name" : "temurin build script 1.0 for jdk21u",
          "taskTypes" : ["clone", "build"],
          "steps" : [
            {
              "name" : "clone repo",
              "description" : "clone repository",
              "commands" : [
                {"executed" : "git clone [email protected]:adoptium/temurin-build"}
              ]
            },
            {
              "name" : "cd into repository",
              "description" : "cd into temurin-build",
              "commands" : [
                {"executed" : "cd temurin-build"}
              ]
            },
            {
              "name" : "makejdk",
              "description" : "execute makejdk-anyplatform.sh",
              "commands" : [
                {"executed" : "bash ./makejdk-any-platform.sh jdk21u --with-version-string=21.0.2+13-202312052047 --with-vendor-version-string=202312052047"}
              ]
            }
          ]
        }
      ]
    }

I appreciate any critique or feedback (naming conventions, placement, additional/different fields, uid) and I'm happy to iterate on this.

Lukisorisch avatar Aug 14 '25 21:08 Lukisorisch

Possibly something to be aware of / give input to: https://github.com/CycloneDX/specification/issues/565

smlambert avatar Aug 28 '25 15:08 smlambert