Ds08msa internal v0.4
Rebased and moved to new branch name from https://github.com/Dasharo/coreboot/pull/573
We have to decide what changes here make sense. I already wrote a justification for the use of Dasharo SDK here: https://github.com/Dasharo/dasharo-sdk/pull/9
In the long run, I hope we can replace dasharo-sdk with something like stageX.
@miczyg1 @SergiiDmytruk I get merge conflict because of CONFIG_EDK2_TAG_OR_REV change. I'm unsure if I can afford to use untested ref for training; I prefer not. My preference is to use v0.9.0 tag.
What is our policy in such a situation? In theory, this PR should not be reviewed because of a merge conflict.
EDIT: now I see: 7dbfe58ba5dc08e07c253e53b5c1bfed7758ddf6 is not a valid git reference but it was during the Odroid v0.9.0 release.
EDIT2: this is a problem with my fancy configuration with EDKII local path
I get merge conflict because of CONFIG_EDK2_TAG_OR_REV change.
I think all boards are updated at the same time largely because not doing it can cause a board to fail building as a result EDK2-related changes in coreboot.
I'm unsure if I can afford to use untested ref for training; I prefer not. My preference is to use v0.9.0 tag.
Tag will use an older revision anyway, so I don't really see a problem: tag uses tested EDK, dasharo uses latest EDK.
I think all boards are updated at the same time largely because not doing it can cause a board to fail building as a result EDK2-related changes in coreboot.
Understood. I probably should better split this PR between what I relay on for training vs what could be useful for others. Obviously my work cannot be proven by reproducibility because we don't build timeless and hash of current commit is taken into build.
So what I'm really looking is advise how this PR should be submitted, because first I should not request review if I have merge request (we know how it would end), and second I would like not to broke other uses cases. AFAIK we don't have any tests to confirm if this script keep working as expected.
Tag will use an older revision anyway, so I don't really see a problem: tag uses tested EDK, dasharo uses latest EDK.
You're right somehow I though that was hash before rebase, but it was my mistake instead of hash which disapeared due to rebase. Sorry for the noise.
Obviously my work cannot be proven by reproducibility because we don't build timeless and hash of current commit is taken into build.
Why? If we build from the same commit using the same SDK, it should give the same result. Why should we care if N code revisions/commits give the same hash or not? To me a different commit means different code and shouldn't give the same hash.
So what I'm really looking is advise how this PR should be submitted, because first I should not request review if I have merge request (we know how it would end), and second I would like not to broke other uses cases. AFAIK we don't have any tests to confirm if this script keep working as expected.
I'd just rebase this branch onto dasharo. build.sh is used by CI to build Protectli boards. You're not very likely to break anything with the current copy&paste style of the script.
Why? If we build from the same commit using the same SDK, it should give the same result. Why should we care if N code revisions/commits give the same hash or not? To me a different commit means different code and shouldn't give the same hash.
For years, I've disliked coreboot-sdk and how it impacts the whole building story in the long run, including the reproducibility of older builds. We essentially can't achieve this with the coreboot build environment: https://reproducible-builds.org/docs/perimeter/ This has to be changed.
For various reasons, I cannot build from the same SDK:
- coreboot-sdk:2024-02-18_732134932b has 7.46GB - this is too much when I have to use it for a reproducible training environment - here I made some steps to minimize it, and I think we can create even more minimalistic toolchain
- coreboot-sdk is not reproducible, dasharo-sdk also, but dasharo-sdk would be an abstraction which we replace with stagex (or something better if there will be a candidate)
For various reasons, I may have issues with building from the same commit - dasharo-pq.
The code here makes no change to Odroid-H4+. I changed only the toolchain (not really since I still rely on the same compiler from coreboot-sdk 2024-02-18_732134932b). I'm getting a different hash because we consider the hash of a commit something that should be part of the final binary instead of using BUILD_TIMELESS.
Maybe this is not important to you; maybe my use cases are niche and unnecessary, but this PR contains a couple of things that help me:
- speed up the build process by not fetching edk2 but having copy locally,
- no Internet access is needed if things are correctly prepared repository TBH, I need two things: check iPXE and correctly provide a patch for EDKII before entering
--network nonein my repo, - I can use DASHARO_SDK, which I like for a given build,
Some of those features have generic value for the whole project, so I'm asking whether and how I could contribute to those changes. Or, more directly, if needed, how to split this PR so some of those required features can be merged without affecting everything else (hence the lack of testability in this area of code?).
EDIT: I edited airgap requirement
@miczyg1 @SergiiDmytruk ?
@pietrushnic This script build.sh is for mere mortals, not devs. It is supposed to make building simple and robust. That is why:
- We CAN'T change distclean to clean, it is not safe. Distclean is required for switching boards. Imagine somebody builds board X then builds board Y from the same coreboot directory. It will lead to unexpected problems and misconfiguration. Clean can only be used when not switching boards and ODROID H4 isn't the only board supported by us. So you will have to keep distclean, but download payload source each time when build.sh is called.
- We CAN change to dasharo-sdk, because it has all the xgccs we need. No problem.
- We CAN check out the payloads' sources before starting the build and after creating full config with
make olddefconfig. Doesn't really matter if the final make does it, or it is done before that. So generally--network noneis fine.
The code here makes no change to Odroid-H4+. I changed only the toolchain (not really since I still rely on the same compiler from coreboot-sdk 2024-02-18_732134932b). I'm getting a different hash because we consider the hash of a commit something that should be part of the final binary instead of using BUILD_TIMELESS.
And how BUILD_TIMELESS helps if you change the SDK in build.sh? It will produce a different binary from the same code, because the toolchain was different, but it went undetected, because of BUILD_TIMELESS. Building with build.sh and without BUILD_TIMELESS ensures that everybody build the binaries the same way we do. We also avoid entering a support hell where everybody has a different way of building binaries.
We CAN'T change distclean to clean, it is not safe. Distclean is required for switching boards. Imagine somebody builds board X then builds board Y from the same coreboot directory. It will lead to unexpected problems and misconfiguration. Clean can only be used when not switching boards and ODROID H4 isn't the only board supported by us. So you will have to keep distclean, but download payload source each time when build.sh is called.
Fair point. I can do that modification. I can make it transparent for the user by:
- Cloning edk2 in build script - this can be done by parsing config files or introducing checkout step to edk2 makefiles
- checkout ipxe with the network before doing the airgap build
Is that acceptable?
And how BUILD_TIMELESS helps if you change the SDK in build.sh? It will produce a different binary from the same code, because the toolchain was different, but it went undetected, because of BUILD_TIMELESS.
To clarify:
- The toolchain is not different. I'm using the same coreboot-sdk, but instead of taking the container, I'm just extracting what is used. So, GCC and other toolchain components are used in the build without dasharo-sdk.
- The built environment should not have an impact on the build result. That would be the ideal situation. It is not impossible. If the build environment impacts build results, we should have a reproducible build environment. If the build environment impacts build results and we cannot reproduce the build environment, the design is broken.
- Also, Martin Roth proved at some point that multiple toolchain versions generate the same binary hash, and a new toolchain can build old code. Yes, this is luck and not something we should rely on. It is documented in leadership notes, IIRC. This is not an argument, just a point such things are possible and maybe even more common than we suspect.
To the point how BUILD_TIMELESS helps, lack of it does not help:
- Right now, because of not using BUILD_TIMELESS changes around the toolchain cannot be verified because it does not matter if we use the same toolchain or not, and no matter if it introduces a change to the build result, we get a different build result hash because git hash is in binary. So, we cannot verify if the changes introduced are because of toolchain changes.
We could also consider how the source version control system is part of the build environment. It is no longer part of the build environment source code that should be able to exist with or without SVC. The fact that the build environment is mangled with SVC is a terrible design, leading to issues described above, such as the same code with the same compiler giving you different build results. Please consider how crazy it is that zero change in code gives different build results because even git amend with no change to code would lead to a change in build result hash. This is very fragile.
I understand this can be out of the scope of the build.sh, and that is fine. I also know that those problems may not be important to Dasharo developers with different issues. I'm building awareness and a path for my use cases, not being the enemy of the primary use case but being able to coexist.
Building with build.sh and without BUILD_TIMELESS ensures that everybody build the binaries the same way we do. We also avoid entering a support hell where everybody has a different way of building binaries.
We already have situations where not everyone builds the binaries similarly, but let's leave it alone. You are correct that this is most common and adopted for most Dasharo-supported hardware build processes. We can also agree that if it doesn't work for me, it is my problem, so Dasharo developers should not get more work from my use cases (training, dasharo-pq). That's why I'm asking for feedback on how to achieve that in a non-interfered way. I could create my build.sh? I need that solution upstream because I build the whole infrastructure on that (PET and dasharo-pq).
We CAN'T change distclean to clean, it is not safe. Distclean is required for switching boards. Imagine somebody builds board X then builds board Y from the same coreboot directory. It will lead to unexpected problems and misconfiguration. Clean can only be used when not switching boards and ODROID H4 isn't the only board supported by us. So you will have to keep distclean, but download payload source each time when build.sh is called.
Fair point. I can do that modification. I can make it transparent for the user by:
- Cloning edk2 in build script - this can be done by parsing config files or introducing checkout step to edk2 makefiles
It would simply need make -C payloads/external/edk2 payloads/external/edk2/workspace/Dasharo or make -C payloads/external/edk2 /home/coreboot/coreboot/payloads/external/edk2/workspace/Dasharo (not sure what CURDIR result will be). I don't think you have to modify Makefile. Also make -C payloads/external/edk2 payloads/external/edk2/workspace/edk2-platforms or make -C payloads/external/edk2 /home/coreboot/coreboot/payloads/external/edk2/workspace/edk2-platforms if edk2-platforms is used (can be obtained from config too)
- checkout ipxe with the network before doing the airgap build
Is that acceptable?
Yes.
And how BUILD_TIMELESS helps if you change the SDK in build.sh? It will produce a different binary from the same code, because the toolchain was different, but it went undetected, because of BUILD_TIMELESS.
To clarify:
- The toolchain is not different. I'm using the same coreboot-sdk, but instead of taking the container, I'm just extracting what is used. So, GCC and other toolchain components are used in the build without dasharo-sdk.
Not in this particular case, but I was saying rather hypothetically.
- The built environment should not have an impact on the build result. That would be the ideal situation. It is not impossible. If the build environment impacts build results, we should have a reproducible build environment. If the build environment impacts build results and we cannot reproduce the build environment, the design is broken.
I'm not an GCC expert, but newer GCC version may have different features, optimization, security fixes which may impact how final binary looks like after compilation. That's what I am trying to say.
- Also, Martin Roth proved at some point that multiple toolchain versions generate the same binary hash, and a new toolchain can build old code. Yes, this is luck and not something we should rely on. It is documented in leadership notes, IIRC. This is not an argument, just a point such things are possible and maybe even more common than we suspect.
I can't agree with and a new toolchain can build old code.. Maybe it will build coreboot, but it is easy to break the build of a payload with newer toolchain.
To the point how BUILD_TIMELESS helps, lack of it does not help:
- Right now, because of not using BUILD_TIMELESS changes around the toolchain cannot be verified because it does not matter if we use the same toolchain or not, and no matter if it introduces a change to the build result, we get a different build result hash because git hash is in binary. So, we cannot verify if the changes introduced are because of toolchain changes.
Why? You can do BUILD_TIMELESS using any toolchain you want and have that binary as a reference, then do whatever you want with toolchain, use BUILD_TIMELESS again and compare the results. You are not forced to use build.sh, manually invoking commands will work too. As simple as that.
We could also consider how the source version control system is part of the build environment. It is no longer part of the build environment source code that should be able to exist with or without SVC. The fact that the build environment is mangled with SVC is a terrible design, leading to issues described above, such as the same code with the same compiler giving you different build results. Please consider how crazy it is that zero change in code gives different build results because even git amend with no change to code would lead to a change in build result hash. This is very fragile.
But at least the BIOS release date in SMBIOS is correct :)
We already have situations where not everyone builds the binaries similarly, but let's leave it alone. You are correct that this is most common and adopted for most Dasharo-supported hardware build processes. We can also agree that if it doesn't work for me, it is my problem, so Dasharo developers should not get more work from my use cases (training, dasharo-pq). That's why I'm asking for feedback on how to achieve that in a non-interfered way. I could create my build.sh? I need that solution upstream because I build the whole infrastructure on that (PET and dasharo-pq).
Yes, not everyone builds the binaries the same way. But we can conclude it with one sentence "please use build.sh" and the end of topic.
If you need the BUILD_TIMELESS, then add it to the script or make the script pass the environments BUILD_TIMELESS to the docker container. I'm fine with that. However, no matter what we do, we will not fix the past, i.e. no way to get the same hash as release binary if we modify anything in the tree (build.sh including). Thus as I proposed earlier, you can always build the reference with BUILD_TIMELESS by manually pasting commands. Then re-check against this reference after any modifications you make.
It would simply need make -C payloads/external/edk2 payloads/external/edk2/workspace/Dasharo or make -C payloads/external/edk2 /home/coreboot/coreboot/payloads/external/edk2/workspace/Dasharo (not sure what CURDIR result will be). I don't think you have to modify Makefile. Also make -C payloads/external/edk2 payloads/external/edk2/workspace/edk2-platforms or make -C payloads/external/edk2 /home/coreboot/coreboot/payloads/external/edk2/workspace/edk2-platforms if edk2-platforms is used (can be obtained from config too)
Thanks. I will add it.
I can't agree with and a new toolchain can build old code.. Maybe it will build coreboot, but it is easy to break the build of a payload with newer toolchain.
This was probably tested with the QEMU target with coreboot. Toolchain often breaks with our EDKII, which is mainly our issue, not upstream. If iPXE breaks, then upstream fix that as part of the bump of iPXE. Not sure about other payloads. However, this proves that the toolchain has a problem because it has to build a coreboot and many different projects.
As I said, this is not an argument, so there is no need to rebut.
Why? You can do BUILD_TIMELESS using any toolchain you want and have that binary as a reference, then do whatever you want with toolchain, use BUILD_TIMELESS again and compare the results. You are not forced to use build.sh, manually invoking commands will work too. As simple as that.
Of course, I ended up with more work on explaining all the necessary details to students and dealing with the support of it live during the class. Also, my CI/CD related to dasharo-pq and PET have to be coded differently than what is used by most developers. I don't say it is not possible. It is, but I have to give up the infrastructure used by all other developers and do some more work. Before that, I wanted to try this path. I understand you say BUILD_TIMELESS is not a good fit for build.sh. That is fine. I will try to find another way. I have a gut feeling this discussion will get back to us sooner or later in another form, but maybe right now, let's give it up.
If you need the BUILD_TIMELESS, then add it to the script or make the script pass the environments BUILD_TIMELESS to the docker container. I'm fine with that. However, no matter what we do, we will not fix the past, i.e. no way to get the same hash as release binary if we modify anything in the tree (build.sh including). Thus as I proposed earlier, you can always build the reference with BUILD_TIMELESS by manually pasting commands. Then re-check against this reference after any modifications you make.
Yes, I will follow both:
- I will extend help to explain additional environment variables considered during the build or add parameters (airgap, DASHARO_SDK, BUILD_TIMELESS) so the interface will be backward compatible.
- I will compare only BUILD_TIMELESS hashes before and after changes to confirm that infrastructure changes didn't affect build results.
@miczyg1 I simplified the whole change into three things:
- Switch to using DASHARO_SDK for all platforms: https://github.com/Dasharo/coreboot/pull/619/commits/7aafa91ce454450dddb3226524adcc13d9c2f300 - it is based on the same coreboot sdk, so there should be no issues.
- Through AIRGAP flag introduce airgap mode, which I explained in commit message: https://github.com/Dasharo/coreboot/pull/619/commits/9d6d9c27185c18f0730e81ff765193c26c15c166
- Add support for building odroid and its btg version: https://github.com/Dasharo/coreboot/pull/619/commits/bb1b9a05ab245d6c5a3fa163ad721abd51c3d300
Please let me know if this is better and match Dasharo needs.
@miczyg1 I simplified the whole change into three things:
- Switch to using DASHARO_SDK for all platforms: 7aafa91 - it is based on the same coreboot sdk, so there should be no issues.
We cannot use latest, because it is introducing a moving target of the SDK. We have to tag it first.
- Through AIRGAP flag introduce airgap mode, which I explained in commit message: 9d6d9c2
- Add support for building odroid and its btg version: bb1b9a0
Please let me know if this is better and match Dasharo needs.
@pietrushnic since you change the SDK in build.sh, it would be wise to change it in GH action workflows as well. That way CI will build test everything using your new SDK
Yeah, I would be glad to have time to get back to this. If you have time to fix it, please do; otherwise, it will have to wait for me, or we can ask someone.
The roadmap is to get back to this in July/August 2025 unless someone picks it up and fixes it, since it is already annoying.
Changed CI to use build.sh, so the old build methods still work as expected.
Also tested the timeles build by:
-
BUILD_TIMELESS=1 ./build.sh vp2420 -
echo dummy > dummy && git add dummy && git commit -m "dummy" -
BUILD_TIMELESS=1 ./build.sh vp2420 - Compare binaries from step 1 and 3 and verify they are the same.
Hashes I got:
fa2505fa3f96f2fdbba787052b1869b68e90ba0c2288697534e9ec5d2aeea6f6 test1.rom
fa2505fa3f96f2fdbba787052b1869b68e90ba0c2288697534e9ec5d2aeea6f6 test2.rom
First built from d6d16f17a364d1b244eec2e819f15c960297de41 second after committing dummy file, so I believe it works.
To be 150% sure, you could add step 5. build without BUILD_TIMELESS defined. That should fail witha different hash.
To be 150% sure, you could add step 5. build without BUILD_TIMELESS defined. That should fail witha different hash.
yes, I was going to do that now.
The non-timeless builds:
6b75d605b3baf6016535de4f0713267498ac043b970d33e893a8e0dfd7760130 test1.rom
f919b9d2744eb5b6ecb444dfa2f85d0b3f9baf1f85f2bf0709997007a16d020f test2.rom
First without dummy commit, second with dummy commit. As expected, hashes differ.
Rebased on top of target branch, once CI passes, will merge