lightning icon indicating copy to clipboard operation
lightning copied to clipboard

Repro nightly builds

Open s373nZ opened this issue 1 year ago • 16 comments

Repro nightly builds

Description

This pull request attempts to address #7117 to implement Github Actions CI/CD to perform nightly reproducible builds. The goal is to catch local file changes that might emerge through the build process and result in a -modded release version. Currently supporting focal, jammy and noble Ubuntu releases.

Have been testing by adding the following to the on key of .github/workflows/repro.yml:

  push:
    branches:
      - 7117-repro-nightly-builds

Add adding some temporary commit to dirty the tree in the contrib/reprobuild/Dockerfile.[release]

Related Issues

  • Closes #7117 and hopes to score the bounty :)

Checklist

Ensure the following tasks are completed before submitting the PR:

  • [x] Changelog has been added in relevant commits.
  • [x] Tests have been added or updated to cover the changes.
  • [x] Documentation has been updated as needed.
  • [x] Any relevant comments or TODOs have been addressed or removed.

Additional Notes

  • Added an entry to the CHANGELOG -- it was the first one observed since the last release, so also added the front matter.
  • I started this work a few days ago, without realizing there was a "expiration" date on the Sphinx bounty entry for Aug. 2, 2024. Just before submitting this PR, and rebasing against master, I've noticed @ShahanaFarooqui has made a few commits regarding the repro build process. Don't mean to be stepping on toes! I don't know if this work overlaps, and I'm hoping it still qualifies for the bounty, but if not, please consider it as an alternative submission to whatever planning around repro builds made since Aug. 2.

s373nZ avatar Sep 07 '24 18:09 s373nZ

The build failures appear to be in different place each time, and should not be related to my changes, so I assume they are inherited in CI build failures bubbling up from new changes to master over the weekend. I'll do my best to monitor the CI and rebase opportunely. Otherwise, please advise.

s373nZ avatar Sep 09 '24 09:09 s373nZ

Some reformatting of the commands and minor syntactical optimizations.

s373nZ avatar Sep 13 '24 15:09 s373nZ

Removed modifications to the CHANGELOG.md and opted for a Changelog-None hint in the commit message. Users probably don't care about all the CI processes...

s373nZ avatar Sep 17 '24 11:09 s373nZ

@ShahanaFarooqui Thanks for having a look! I'll address at your change requests as soon as I can, but it might take some time. As your question on testing is important, I'm leaving some preliminary feedback here so it's not wrapped up with the review comments.

I've been testing failed builds using this branch, which adds an extra commit to write a line in the Cargo.toml file just prior to the tools/repro-build.sh script being executed by the Dockerfile command. Here is an example failed run.

One interesting and potentially tricky aspect of this PR is this comment in the Dockerfiles -- that first the repo is cloned to the /repo directory by the CI/CD, and then it is cloned again from from that local clone to the "working directory" of the Dockerfile (/build). I remember that this made it a little interesting to figure out a testing plan. Also, I wasn't quite sure how to force a build resulting in a dirty tree. I'm not sure if that has an impact on your own test case, but it's something to be aware of.

I'll try to reproduce your test case as well. Is it safe to say that if I git revert 21fcd45 on my testing branch, you would expect the build to produce a dirty tree?

s373nZ avatar Sep 26 '24 19:09 s373nZ

I'll try to reproduce your test case as well. Is it safe to say that if I git revert 21fcd45 on my testing branch, you would expect the build to produce a dirty tree?

Yes; due to commit 21fcd4594b9db99017e6c6f4cca750566d1b2969, CLN v24.08.1 builds are released with -modded.

ShahanaFarooqui avatar Sep 26 '24 20:09 ShahanaFarooqui

@ShahanaFarooqui I tried to reproduce the -modded build locally using my 7117-repro-nightly-builds-test branch which contains 21fcd45 reverted using the Reproducible builds documentation, but it didn't produce a -modded build for me. I am happy to keep trying, but before I sink too much more time into it, are you able to provide any hints on how the $V variable that is found in the Makefile here is used, and when / how it should be set during the Repro build process to trigger the comment change in cln-rpc/src/model.rs?

Also, do any other test-cases which produce -modded build come to mind that we can try in tandem?

s373nZ avatar Oct 01 '24 12:10 s373nZ

@ShahanaFarooqui I tried to reproduce the -modded build locally using my 7117-repro-nightly-builds-test branch which contains 21fcd45 reverted using the Reproducible builds documentation, but it didn't produce a -modded build for me.

I pulled your branch s373nZ:7117-repro-nightly-builds and it is generating the -modded build locally with below steps:

poetry shell
poetry install
./configure
make

I am happy to keep trying, but before I sink too much more time into it, are you able to provide any hints on how the $V variable that is found in the Makefile here is used, and when / how it should be set during the Repro build process to trigger the comment change in cln-rpc/src/model.rs?

$V variable is to set verbose level (default is 0). I never used it in CLN but you can try it with make V=1. We do not need to set it in the process as default non-verbose run is good enough.

Also, do any other test-cases which produce -modded build come to mind that we can try in tandem?

All commits without tags will only generate -modded versions. Therefore, your current commit is generating a modded version too ( v24.08-65-g4009340-modded). You can see how VERSION is calculated here.

ShahanaFarooqui avatar Oct 02 '24 05:10 ShahanaFarooqui

Apologies in advance for the looooooong comment.

I pulled your branch s373nZ:7117-repro-nightly-builds and it is generating the -modded build locally with below steps...

I followed the steps you describe and replicated the results you report. The version number as computed in the Makefile is indeed, -modded. However, I'm starting to feel confusion regarding the requirements of this task as detailed in in #7117:

  1. The developer tags the release
  2. Then uploads the tag
  3. Builds the reprobuild, but during the build something got changed
  4. The version string now is something like v24.02-modded due to the dirty tree.

My reading of the issue description, particularly #3, is that the Repro build process is performed according to the documentation here. So, my local testing process so far has been:

sudo docker run --rm -v $(pwd):/build ubuntu:noble \
	bash -c "apt-get update && apt-get install -y debootstrap && debootstrap noble /build/noble"
sudo tar -C noble -c . | sudo docker import - noble
sudo docker build -t cl-repro-noble - < contrib/reprobuild/Dockerfile.noble

mkdir -p release
sudo docker run --rm -v $(pwd):/repo -e FORCE_MTIME=$(date +%F) -ti cl-repro-noble

And then I check the name of the artifact tarball in release/ to see if it has been -modded. The extra FORCE_MTIME variable is set because it is not possible to derive the date from CHANGELOG.md unless more formal release steps have been taken.

After testing both methods:

  • Via your method of testing, via a local build, the VERSION is indeed computed to be -modded.
  • Via my method of testing, using Docker under the Repro documentation, the VERSION is not -modded.

I looked into the reason for the difference and it seems to be related to two particularities under the Docker setup:

  1. As mentioned in https://github.com/ElementsProject/lightning/pull/7651#issuecomment-2385702743, the Docker Repro process checks out a fresh copy of the local repository, ignoring any local modifications.
  2. The VERSION for the Docker Repro process is determined here in repro-build.sh, BEFORE the compilation happens.

In the work submitted so far, I've taken special care not to modify any of the Repro process scripts and tools for fear of unintended consequences.

Questions

The below (somewhat rhetorical) questions follow from this research:

  1. Which way are the Repro builds being performed in practice? And which method of testing is deterministic of success against the requirements in #7117?
    • If some methodology performing local builds is used, should the Repro documentation be updated, and should a local build be performed in the Github action instead of the Repro build?
    • If the Docker method in the docs is used, under what circumstances would a dirty tree from a local build become an issue with a release?
  2. Is it possible that computing the VERSION in repro-build.sh prior to performing the build is considered a bug?
    • Maybe a refactor to change the Dockerfiles to perform a fresh repo checkout in this commit had the unintended consequence of missing local changes when computing the initial VERSION?
    • I would take a shot at changing the repro-build.sh script to compute the VERSION after the build is performed, but I see a conflict because VERSION is passed in to the make command.
  3. According to your last comment, All commits without tags will only generate -modded versions. If this is the case, wouldn't ALL nightly builds produce -modded as they aren't tagged, and thus negate the value of having nightly builds? When building under master, the version is not -modded, but that branch also doesn't contain post-build local changes.

Possible Solutions

  • Add another step to the action which 1) performs a "regular" local build prior to the Repro build, 2) computes the VERSION and passes it in as an override variable FORCE_VERSION to 3) make the Repro build use this value for the test release.
    • Concerned this is redundant and bypasses the Repro build process too much.
  • Remove the Docker Repro build functionality and replace it with a local build instead.
    • Concerned this is not reflecting the Repro build steps at all as per the documentation.

Thanks again for your help and patience on this. Let me know if I'm missing something or you have feedback on the requirements or next steps. Happy to catch you on Discord sometime to discuss as well.

sidenote: I noticed the bash command to compute the VERSION differs slightly between the Makefile here and the repro-build.sh script here. Should these be uniform?

s373nZ avatar Oct 02 '24 13:10 s373nZ

@s373nZ It seems there might be some misunderstanding regarding both the requirements and the approach. Let's discuss it on Discord first, and then we can update the final decision here. Are you on CLN's Discord server?

ShahanaFarooqui avatar Oct 03 '24 02:10 ShahanaFarooqui

All commits without tags will only generate -modded versions. Therefore, your current commit is generating a modded version too ( v24.08-65-g4009340-modded). You can see how VERSION is calculated here.

Apologies for the confusion between a dirty tree (-modded) and the number of commits with the commit hash suffix. Each new commit increases the commit count and updates the commit hash, but it does not trigger the --dirty flag unless there are uncommitted changes in the working directory.

ShahanaFarooqui avatar Oct 03 '24 18:10 ShahanaFarooqui

Now that we agree on the difference between the commit hash and the -modded appearing in the VERSION, Shahana explained more about the release process. I need to explore that more to determine how builds with local changes are actually ending up -modded. Will look further into tools/build-release.sh and the Release Checklist for clues on how to modify the PR to emulate these failures.

To test I will make two additional test branches with the following changes

  • Change pyln-client version from "^24.2" to "^24.8" in the clnrest plugin and it should make a dirty version of poetry.lock. This should result in a build which is -modded.
  • Revert the changes in https://github.com/ElementsProject/lightning/pull/7612 which should result in the Repro build failing.

s373nZ avatar Oct 03 '24 19:10 s373nZ

  • Change pyln-client version from "^24.2" to "^24.8" in the clnrest plugin and it should make a dirty version of poetry.lock. This should result in a build which is -modded.

This will work only if we run poetry install first. But Dockerfile.<ubuntu-version> uses pip install instead :).

ShahanaFarooqui avatar Oct 03 '24 19:10 ShahanaFarooqui

Made a few more changes:

  • Added a phony Makefile target named version which just outputs the $(VERSION) variable that is computed by the Makefile bootstrap. Makes it available to re-use this functionality in other processes.
  • Adjusted the Repro Github action to extract the VERSION from the saved image in a similar fashion to how we are capturing the Git logs.
  • Adjusted the Assert check to read the extracted version for -modded and fail on that case as well as a -modded release file.
  • Rebased against master

s373nZ avatar Oct 04 '24 15:10 s373nZ

Testing

@ShahanaFarooqui following from our conversations above and in Discord, here is a summary and results of the test cases:

  1. Testing branch: https://github.com/s373nZ/lightning/tree/7117-repro-nightly-builds-test
  • This branch only contains a modification to run the action on a push to the branch. Its intent is to test the PR functionality without needing to wait until the run is scheduled.
  • Interestingly, after I rebased against master, this branch's run succeeded for noble and jammy, yet failed for focal as there appears to be local modifications after the repro build. I tested this locally and reproduced the local modifications under focal, so I'm actually wondering if there is a pre-existing -modded build happening on focal.
  • Due to the above, focal builds fail across all the tests.
  1. Test case 1: https://github.com/s373nZ/lightning/tree/7117-repro-nightly-builds-test-1
  • This branch has reverted commit https://github.com/ElementsProject/lightning/commit/21fcd4594b9db99017e6c6f4cca750566d1b2969 here
  • RUN FAILS as expected due to differences in cln-rpc/src/model.rs
  • TEST SUCCESS (AFAICT)
  1. Test case 2: https://github.com/s373nZ/lightning/tree/7117-repro-nightly-builds-test-2
  • This branch is a direct checkout of origin/release-24.08.1 with this PR's commits cherry-picked on top.
  • RUN FAILS as expected due to differences in cln-rpc/src/model.rs
  • TEST SUCCESS (AFAICT)
  1. Test case 3: https://github.com/s373nZ/lightning/tree/7117-repro-nightly-builds-test-3
  • This branch has the commit from PR #7612 reverted.
  • RUN FAILS for Noble as expected due to issue with zlib1g-dev package. Also fails for focal due to local modifications detailed in 0 across all tests.
  • TEST SUCCESS (AFAICT)
  1. Test case 4: Used the following instructions locally, but substituted jammy for noble, as there was an error compiling lowdown with jammy because it required GLIBC_2.38. May be related to recent lowdown releases in last few days?

Instructions

Failure case 1:
-------------------------------------------------------------------------------------------------------------
1: Checkout master branch
2: Build builder images: `. contrib/cl-repro.sh`
3: Run your local image: docker run -it -v $(pwd):/repo cl-repro-jammy /bin/bash
4: Compile CLN inside the builder image:
    4.1: cd /repo
    4.2: ./configure
    4.3: make
5: Dirty tree due to missing poetry shell & install:
    modified:   contrib/pyln-grpc-proto/pyln/grpc/node_pb2.py
    modified:   contrib/pyln-grpc-proto/pyln/grpc/node_pb2_grpc.py
    modified:   contrib/pyln-grpc-proto/pyln/grpc/primitives_pb2.py
-------------------------------------------------------------------------------------------------------------
  • I ran this locally with noble and then tested it against a copy of the assert logic, and it produced a -modded VERSION with the expected dirty tree.
  • TEST SUCCESS (AFAICT - I don't see how we can run this properly through a branch and the Github action, and the jammy compilation for lowdown is also concerning)

In summary, I am pretty happy with the testing so far. The big question is whether the new changes have discovered that focal repro builds are always producing dirty, -modded versions. Let me know what you think. Happy to chat on Discord or set up another meeting to go over things more efficiently. And no rush -- take your time :)

s373nZ avatar Oct 04 '24 17:10 s373nZ

  1. Testing branch: https://github.com/s373nZ/lightning/tree/7117-repro-nightly-builds-test
  • This branch only contains a modification to run the action on a push to the branch. Its intent is to test the PR functionality without needing to wait until the run is scheduled.
  • Interestingly, after I rebased against master, this branch's run succeeded for noble and jammy, yet failed for focal as there appears to be local modifications after the repro build. I tested this locally and reproduced the local modifications under focal, so I'm actually wondering if there is a pre-existing -modded build happening on focal.
  • Due to the above, focal builds fail across all the tests.

I pushed the last commit to fix the dirty tree issue on Focal by locking the grpcio-tools version. The problem was caused by grpcio-tools version mismatch. Core lightning is using v1.62.2 as specified in poetry.lock, but Dockerfile.<ubuntu-distro> was installing v1.66.2 by default (via pip install). This led to modifications in contrib/pyln-grpc-proto/pyln/grpc/node_pb2.py, contrib/pyln-grpc-proto/pyln/grpc/node_pb2_grpc.py, and contrib/pyln-grpc-proto/pyln/grpc/primitives_pb2.py.

Solution reference: https://github.com/ElementsProject/lightning/pull/7376#issuecomment-2161102381

I also updated the Python version to 3.10.0 on Focal to ensure consistency across all three distributions, making it more future-proof.

  1. Test case 4: Used the following instructions locally, but substituted jammy for noble, as there was an error compiling lowdown with jammy because it required GLIBC_2.38. May be related to recent lowdown releases in last few days?

As discussed on Discord, lowdown releases cannot impact CLN because lowdown is locked to commit edca6ce for us..

In summary, I am pretty happy with the testing so far. The big question is whether the new changes have discovered that focal repro builds are always producing dirty, -modded versions.

Yeah, testing looks really great; we have already covered some good real-world scenarios. However, my last push did not trigger the action, and I was unable to trigger it manually too. Could you please confirm once again if the Focal build passes on the master?

ShahanaFarooqui avatar Oct 12 '24 04:10 ShahanaFarooqui

ACK c7f89c39bf453d84c3dd60e96473dfa0574a6182.

ShahanaFarooqui avatar Oct 14 '24 21:10 ShahanaFarooqui