arrow
arrow copied to clipboard
GH-41102: [Packaging][Release] Create unique git tags for release candidates (e.g. apache-arrow-{MAJOR}.{MINOR}.{PATCH}-rc{RC_NUM})
Rationale for this change
As per @kou's suggestion in #40956, we should create unique git tags (e.g. apache-arrow-{MAJOR}.{MINOR}.{VERSION}-rc{RC_NUM}) instead re-using the same git tag (apache-arrow-{MAJOR}.{MINOR}.{VERSION}) for each release candidate. The official release candidate tag (apache-arrow-{MAJOR}.{MINOR}.{VERSION}) should be created only after a release candidate is voted on and accepted. This "official" release tag should point to the same object in the git database as the accepted release candidate tag.
The new release workflow could look like the following:
- Create a apache-arrow-X.Y.Z-rc0 tag for X.Y.Z RC0
- (Found a problem for X.Y.Z RC0)
- Create a apache-arrow-X.Y.Z-rc1 tag for X.Y.Z RC1
- Vote
- Passed
- Create a apache-arrow-X.Y.Z tag from apache-arrow-X.Y.Z-rc1 ike apache/arrow-adbc and apache/arrow-flight-sql-postgresql do
See @kou's comment for more details.
What changes are included in this PR?
- Updated
dev/release/01-prepare.shto create release-candidate-specific git tags (e.g.apache-arrow-{MAJOR}.{MINOR}.{PATCH}-rc{RC_NUM}). - Updated scripts in
dev/releaseto use the new git tag name. - Added GitHub Workflow file
publish_release_candidate.yml. This workflow is triggered when a release candidate git tag is pushed and creates a Prerelease GitHub Release. - Added logic to
dev/release/02-post-binary.shto create and push the release git tag (i.e.apache-arrow-{MAJOR}.{MINOR}.{PATCH}). - Added GitHub Workflow
publish_release.yml. This workflow is triggered when the release tag is pushed and creates a GitHub Release for the approved release (i.e. the voted upon release). - Added
dev/release/post-16-delete-release-candidates.shto delete the release candidate git tags and their associated GitHub Releases. - Updated
docs/developers/release.rstwith the new steps.
Are these changes tested?
- We were not able to verify the changes made to the scripts in
dev/release. Any suggestions on how we can verify these scripts would be much appreciated :) - We did test the new GitHub Workflows (
publish_release_candidate.ymlandpublish_release.yml) work as intended by pushing git tags tomathworks/arrow.
Are there any user-facing changes?
No.
Open Questions
- We noticed that apache/arrow-flight-sql-postgresql does not delete the release candidate Prereleases from their GitHub Releases area. Should we be doing the same? Or would it be preferable to just delete the the release candidates without deleting the release candidate tags.
- We're not that familiar with ruby, so we're not sure if the changes we made to
dev/release/02-source-test.rbmake sense.
Future Directions
- Continue working on #40956
- Add logic to auto-sign release artifacts in GitHub Actions Workflows.
- GitHub Issue: #41102
I had a first look at the workflows only. I think we can just merge them into one and run|skip some steps with
if:when the triggering tag has the-rcsuffix.
That's a good point. I'll go ahead and merge the yml files together.
Going to test out the new release.yml workflow on mathworks/arrow. I'll post a comment once I'm done with that.
I just noticed we might need INFRA approval before merginging this:
Automated services MUST NOT push data to a repository or branch that is subject to official release as a software package by the project, unless the project secures specific prior authorization of the workflow from Infrastructure.
https://infra.apache.org/github-actions-policy.html
Kind of depends on if creating a release counts as 'pushing data to the repo'. I actually don't think it does because its Github only and doesn't affect the 'arrow.git' contents. But I will ask.
I was about to open an issue for INFRA to look over the workflow when I noticed that this PR also contains the changes from #40956 (+ some new changes on the workflows) is this intentional or do we want to move those changes back out of this PR? I just need clarity which PR I send to INFRA ^^
Just a quick update. I tested the new release_candidate.yml and release.yml workflows on the mathworks/arrow repository by pushing the tags test-apache-arrow-16.0.0-rc0 and test-apache-arrow-16.0.0 to mathworks/arrow. It's worth mentioning I modified the workflow files to expect the tag names to start with the prefix test (I didn't want to push tags whose names matched the patterns apache-arrow-X.Y.Z and apache-arrow-X.Y.Z-rcN).
Both the release_candidate and the release workflows passed. You can view the generated releases in mathworks/arrow's GitHub Releases area.
I'll delete those releases once this PR is done. For now, I'll leave them up to serve as a proof of concept - unless you prefer I delete them asap.
I was about to open an issue for INFRA to look over the workflow when I noticed that this PR also contains the changes from #40956 (+ some new changes on the workflows) is this intentional or do we want to move those changes back out of this PR? I just need clarity which PR I send to INFRA ^^
I think the plan is to first submit this PR and then rebase #40956 once this PR is merged. While working on #40956, I realized it made more sense to first add "generic" release candidate and release workflows before adding the MATLAB specific logic. Once this PR is merged, I'll go back to working on #40956. Sorry for the confusion!
Nice, great progress!
it made more sense to first add "generic" release candidate and release workflows
Oh yeah that makes sense, though I do think we should also add the official tarball + signature to the releases. Looking at the release scripts the only change to make would be to upload the tarball to svn before pushing the (rc) tag, which triggers this workflow where we could then pull from dist.a.o .
I have opened a ticket with INFRA https://issues.apache.org/jira/browse/INFRA-25708
before we merge this
We also have to wait for infra to review this.
I want to move the process that generates apache-arrow-X.Y.Z.tar.gz from 02-source.sh to .github/workflows/release_candidate.yml as a separated task.
Then let's also make the workflow sign it and adhere to the requirements listed here, which will also need a review by INFRA/security so we can get a 2 for 1 :)
Sorry I accidentally closed the pull request - just re-opened it.
+1 but we should wait for a review from @raulcd before we merge this. (@raulcd may review this after 16.0.0 release.)
I want to move the process that generates
apache-arrow-X.Y.Z.tar.gzfrom02-source.shto.github/workflows/release_candidate.ymlas a separated task. Something like:diff --git a/dev/release/02-source.sh b/dev/release/02-source.sh index 27ca67a76e..874c1955f8 100755 --- a/dev/release/02-source.sh +++ b/dev/release/02-source.sh @@ -57,33 +57,12 @@ echo "Using commit $release_hash" tarball=apache-arrow-${version}.tar.gz -rm -rf ${tag} -# be conservative and use the release hash, even though git produces the same -# archive (identical hashes) using the scm tag -(cd "${SOURCE_TOP_DIR}" && \ - git archive ${release_hash} --prefix ${tag}/) | \ - tar xf - - -# Resolve all hard and symbolic links. -# If we change this, we must change ArrowSources.archive in -# dev/archery/archery/utils/source.py too. -rm -rf ${tag}.tmp -mv ${tag} ${tag}.tmp -cp -R -L ${tag}.tmp ${tag} -rm -rf ${tag}.tmp - -# Create a dummy .git/ directory to download the source files from GitHub with Source Link in C#. -dummy_git=${tag}/csharp/dummy.git -mkdir ${dummy_git} -pushd ${dummy_git} -echo ${release_hash} > HEAD -echo '[remote "origin"] url = https://github.com/apache/arrow.git' >> config -mkdir objects refs -popd - -# Create new tarball from modified source directory -tar czf ${tarball} ${tag} -rm -rf ${tag} +rm -f ${tarball} +gh release download \ + ${tag} \ + --repo apache/arrow \ + --dir . \ + --pattern "${tarball}" if [ ${SOURCE_RAT} -gt 0 ]; then "${SOURCE_DIR}/run-rat.sh" ${tarball}
This makes sense to me. However, I think it will require re-ordering the scripts to ensure we push the RC tag after the crossbow job containing all build artifacts - specifically the MLTBX file - is finished. Otherwise, we won't be able to attach the MLTBX file - and potentially other build artifacts - as assets to the release candidate. What do you think?
+1 but we should wait for a review from @raulcd before we merge this. (@raulcd may review this after 16.0.0 release.) I want to move the process that generates
apache-arrow-X.Y.Z.tar.gzfrom02-source.shto.github/workflows/release_candidate.ymlas a separated task. Something like:diff --git a/dev/release/02-source.sh b/dev/release/02-source.sh index 27ca67a76e..874c1955f8 100755 --- a/dev/release/02-source.sh +++ b/dev/release/02-source.sh @@ -57,33 +57,12 @@ echo "Using commit $release_hash" tarball=apache-arrow-${version}.tar.gz -rm -rf ${tag} -# be conservative and use the release hash, even though git produces the same -# archive (identical hashes) using the scm tag -(cd "${SOURCE_TOP_DIR}" && \ - git archive ${release_hash} --prefix ${tag}/) | \ - tar xf - - -# Resolve all hard and symbolic links. -# If we change this, we must change ArrowSources.archive in -# dev/archery/archery/utils/source.py too. -rm -rf ${tag}.tmp -mv ${tag} ${tag}.tmp -cp -R -L ${tag}.tmp ${tag} -rm -rf ${tag}.tmp - -# Create a dummy .git/ directory to download the source files from GitHub with Source Link in C#. -dummy_git=${tag}/csharp/dummy.git -mkdir ${dummy_git} -pushd ${dummy_git} -echo ${release_hash} > HEAD -echo '[remote "origin"] url = https://github.com/apache/arrow.git' >> config -mkdir objects refs -popd - -# Create new tarball from modified source directory -tar czf ${tarball} ${tag} -rm -rf ${tag} +rm -f ${tarball} +gh release download \ + ${tag} \ + --repo apache/arrow \ + --dir . \ + --pattern "${tarball}" if [ ${SOURCE_RAT} -gt 0 ]; then "${SOURCE_DIR}/run-rat.sh" ${tarball}This makes sense to me. However, I think it will require re-ordering the scripts to ensure we push the RC tag after the crossbow job containing all build artifacts - specifically the MLTBX file - is finished. Otherwise, we won't be able to attach the MLTBX file - and potentially other build artifacts - as assets to the release candidate. What do you think?
On second thought, we could possibly write a "upload_release_candidate_assets" GitHub Actions that is triggered via gh workflow run and is only called after the crossbow jobs are finished. I imagine the new release workflow would look something like the following:
# Push the release candidate tag - will create the GitHub Release
git push -u apache apache-arrow-<version>-rc<rc-number>
...
# Submit binary tasks using crossbow, the command will output the crossbow build id
dev/release/03-binary-submit.sh <version> <rc-number>
# Wait for the crossbow jobs to finish
archery crossbow status <crossbow-build-id>
# Extracts build artifacts from crossbow job (using 04-binary-download) and adds them as assets to the Release Candidate in apache/arrow's GitHub Release area
gh workflow run upload_release_candidate_assets.yml -f version=<version> rc=<rc-number>
...
Thoughts?
Just an update:
After moving the release tarball generation to release_candidate.yml and using gh release download to download the tarball in 02-source.sh, the source release check (02-source-test.sh) failed because the GitHub Release from which 02-source.sh was trying to download the tarball did not exist.
To workaround this issue, I moved the tarball generation logic into a separate file called dev/release/util-create-release-tarball.sh, which release_candidate.yml calls to generate the release tarball. In 02-source.sh, if an environment variable TEST_RELEASE_SCRIPT is defined and greater than 0, then 02-source.sh invokes util-create-release-tarball.sh to generate the release tarball. Otherwise, it uses gh release download to retrieve the pre-built archive from the GitHub Releases area. Finally, I modified 02-source-test.rb to ensure TEST_RELEASE_SCRIPT is defined and set to 1 when this script invokes 02-source.sh.
Updates:
-
Added
dev/release/08-matlab-upload.shas @kou suggested. This script (1) signs the MLTBX artifacts downloaded from thecrossbowjob and (2) uploads the artifacts to the release candidate GitHub Release's area. -
I noticed a small bug in
matlab/tools/packageMatlabInterface.m. Specifically, the logic for extracting the version string only works on strings formatted like${major}.${minor}.${patch}.dev${num}). I rewrote this logic to support extracting the version string from strings formatted like${major}.${minor}.${patch}-rc${rc_num}as well. -
Modified the name of the MLTBX file generated by
matlab/tools/packageMatlabInterface.mto not include the release candidate number. Instead, the MLTBX file name just includes the version - this is similar to how the name of the tarball created bydev/release/utils-create-release-tarball.shdoes not include the release candidate number.
Next Steps:
-
I still need to add the logic for signing the release tarball to the
release_candidate.ymlworkflow. Once we hear back from INFRA regarding https://issues.apache.org/jira/browse/INFRA-25708, I'll submit a followup ticket requesting a signing key. -
Given that I added
dev/release/08-matlab-upload.shto this PR, I'll close #40956 as it is now superfluous.
Let me know if you have any questions or comments!
I haven't tried this but I think that it will work.
Hi @Kou, I just tested out the most recent changes - with a few minor tweaks - using the mathworks/arrow repo. Here's a link to mathworks/arrow's GitHub Releases area. I'll keep these releases around for a day or so just in case anyone wants to take a look.
Thanks! Let's wait for reviews from others. (We may need to ping them later.)
I have updated the issue to let INFRA know that this is now ready for their review.
@raulcd Could you review this?
Ignore my previous commit! I'm going to revert. I meant to temporarily delete all workflows except for release.yml and release_candidate.yml on a mathworks/arrow branch so that I could test the updated release workflows without triggering the other ones on push! So sorry!
Should be fixed now. Sorry about that!
Just verified the updated versions of release_candidate.yml and release.yml work as expected as expected on mathworks/arrow.
Hi @assignUser,
I just wanted get your thoughts about moving forward with this PR after the jira ticket is resolved.
My understanding was that we can't merge this PR until we add the automated signing logic to the release_candidate.yml workflow. My original thought process was that we would need to wait to do this work until INFRA-25708 is resolved. However, I'm wondering if it makes sense to start that work now so that we can merge this PR in time for the next release. If I do add the automated signing logic, I could either make a separate jira ticket or add a comment to the existing one.
What do you think?
Best, Sarah
However, I'm wondering if it makes sense to start that work now so that we can merge this PR in time for the next release.
Sounds like a great idea! I looked for a reference issue in JIRA that documents the process: https://issues.apache.org/jira/browse/INFRA-25610
Hm I just noticed that "All artifacts being signed can be built reproducibly" is a requirement for automated signing. I don't know that we fulfill that yet (leaning towards no?). I'll have to look into it and see how hard this would be to implement. Though afaik we only sign the tarball, so that shouldn't be too hard to make work? Getting it to work with all binary artifacts would probably be a different story.
After looking into it we do sign a number of additional things (e.g. java, python and C# binaries...), so the reproducible build requirement might block this for a bit, sorry. I will look into what is actually required and how we can implement it.
Appreciate all your work on this @sgilmore10 !!
Thanks for looking into this @assignUser! I appreciate all the help!
In that case, I'll hold off on working on that step. Maybe we can do that in a separate PR once we can meet the "reproducible build" requirement?
Thanks again!
Hi @assignUser,
Given that the reproducible builds requirement may block adding automated release signing for a bit, how do you feel about adding that functionality in a separate PR? If we go that route, I think I need to update dev/release/02-source.sh to sign the downloaded tarball and then upload the signed tarball to the GitHub Releases area.
How does that sound?
Best, Sarah
Given that the reproducible builds requirement may block adding automated release signing for a bit, how do you feel about adding that functionality in a separate PR?
+1
If we go that route, I think I need to update
dev/release/02-source.shto sign the downloaded tarball and then upload the signed tarball to the GitHub Releases area.
We don't need to do it. We don't need to work on automated release signing for both of the source archive and the MATLAB binary at once.
@raulcd Do you have any more comments? Can we merge this?
Hi @kou,
I think we still need to wait on INFRA to approve INFRA-25708 before we can merge this PR.
To clarify, do you think we could ask INFRA for a signing key for just the source tarballs? I was under the impression INFRA would not give out a signing key to use only for signing the source archive, but I may be mistaken.
I was under the impression INFRA would not give out a signing key to use only for signing the source archive, but I may be mistaken.
That's also my understanding (nothing would prevent us to sign non-reproducible binaries with the 'just for source'-key).
I think we still need to wait on INFRA to approve INFRA-25708 before we can merge this PR.
I don't think that we need an approval from INFRA for this PR because this PR doesn't use automated signing. We sill sign manually in dev/release/07-matlab-upload.sh. It's same as the current our release process for our artifacts including the source archive, .deb/.rpm and so on.
To clarify, do you think we could ask INFRA for a signing key for just the source tarballs?
I think so.
I think that we can expand automated signing targets step by step. For example, we enable automated signing only for the source archive and then enable it for .deb too and so on.
I was under the impression INFRA would not give out a signing key to use only for signing the source archive, but I may be mistaken.
Oh, I hadn't thought of that.