arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-41102: [Packaging][Release] Create unique git tags for release candidates (e.g. apache-arrow-{MAJOR}.{MINOR}.{PATCH}-rc{RC_NUM})

Open sgilmore10 opened this issue 1 year ago • 21 comments
trafficstars

Rationale for this change

As per @kou's suggestion in #40956, we should create unique git tags (e.g. apache-arrow-{MAJOR}.{MINOR}.{VERSION}-rc{RC_NUM}) instead re-using the same git tag (apache-arrow-{MAJOR}.{MINOR}.{VERSION}) for each release candidate. The official release candidate tag (apache-arrow-{MAJOR}.{MINOR}.{VERSION}) should be created only after a release candidate is voted on and accepted. This "official" release tag should point to the same object in the git database as the accepted release candidate tag.

The new release workflow could look like the following:

  1. Create a apache-arrow-X.Y.Z-rc0 tag for X.Y.Z RC0
  2. (Found a problem for X.Y.Z RC0)
  3. Create a apache-arrow-X.Y.Z-rc1 tag for X.Y.Z RC1
  4. Vote
  5. Passed
  6. Create a apache-arrow-X.Y.Z tag from apache-arrow-X.Y.Z-rc1 ike apache/arrow-adbc and apache/arrow-flight-sql-postgresql do

See @kou's comment for more details.

What changes are included in this PR?

  1. Updated dev/release/01-prepare.sh to create release-candidate-specific git tags (e.g. apache-arrow-{MAJOR}.{MINOR}.{PATCH}-rc{RC_NUM}).
  2. Updated scripts in dev/release to use the new git tag name.
  3. Added GitHub Workflow file publish_release_candidate.yml. This workflow is triggered when a release candidate git tag is pushed and creates a Prerelease GitHub Release.
  4. Added logic to dev/release/02-post-binary.sh to create and push the release git tag (i.e. apache-arrow-{MAJOR}.{MINOR}.{PATCH}).
  5. Added GitHub Workflow publish_release.yml. This workflow is triggered when the release tag is pushed and creates a GitHub Release for the approved release (i.e. the voted upon release).
  6. Added dev/release/post-16-delete-release-candidates.sh to delete the release candidate git tags and their associated GitHub Releases.
  7. Updated docs/developers/release.rst with the new steps.

Are these changes tested?

  1. We were not able to verify the changes made to the scripts in dev/release. Any suggestions on how we can verify these scripts would be much appreciated :)
  2. We did test the new GitHub Workflows (publish_release_candidate.yml and publish_release.yml) work as intended by pushing git tags to mathworks/arrow.

Are there any user-facing changes?

No.

Open Questions

  1. We noticed that apache/arrow-flight-sql-postgresql does not delete the release candidate Prereleases from their GitHub Releases area. Should we be doing the same? Or would it be preferable to just delete the the release candidates without deleting the release candidate tags.
  2. We're not that familiar with ruby, so we're not sure if the changes we made to dev/release/02-source-test.rb make sense.

Future Directions

  1. Continue working on #40956
  2. Add logic to auto-sign release artifacts in GitHub Actions Workflows.
  • GitHub Issue: #41102

sgilmore10 avatar Apr 10 '24 18:04 sgilmore10

I had a first look at the workflows only. I think we can just merge them into one and run|skip some steps with if: when the triggering tag has the -rc suffix.

That's a good point. I'll go ahead and merge the yml files together.

sgilmore10 avatar Apr 11 '24 16:04 sgilmore10

Going to test out the new release.yml workflow on mathworks/arrow. I'll post a comment once I'm done with that.

sgilmore10 avatar Apr 11 '24 19:04 sgilmore10

I just noticed we might need INFRA approval before merginging this:

Automated services MUST NOT push data to a repository or branch that is subject to official release as a software package by the project, unless the project secures specific prior authorization of the workflow from Infrastructure.

https://infra.apache.org/github-actions-policy.html

Kind of depends on if creating a release counts as 'pushing data to the repo'. I actually don't think it does because its Github only and doesn't affect the 'arrow.git' contents. But I will ask.

assignUser avatar Apr 12 '24 01:04 assignUser

I was about to open an issue for INFRA to look over the workflow when I noticed that this PR also contains the changes from #40956 (+ some new changes on the workflows) is this intentional or do we want to move those changes back out of this PR? I just need clarity which PR I send to INFRA ^^

assignUser avatar Apr 12 '24 18:04 assignUser

Just a quick update. I tested the new release_candidate.yml and release.yml workflows on the mathworks/arrow repository by pushing the tags test-apache-arrow-16.0.0-rc0 and test-apache-arrow-16.0.0 to mathworks/arrow. It's worth mentioning I modified the workflow files to expect the tag names to start with the prefix test (I didn't want to push tags whose names matched the patterns apache-arrow-X.Y.Z and apache-arrow-X.Y.Z-rcN).

Both the release_candidate and the release workflows passed. You can view the generated releases in mathworks/arrow's GitHub Releases area.

I'll delete those releases once this PR is done. For now, I'll leave them up to serve as a proof of concept - unless you prefer I delete them asap.

sgilmore10 avatar Apr 12 '24 18:04 sgilmore10

I was about to open an issue for INFRA to look over the workflow when I noticed that this PR also contains the changes from #40956 (+ some new changes on the workflows) is this intentional or do we want to move those changes back out of this PR? I just need clarity which PR I send to INFRA ^^

I think the plan is to first submit this PR and then rebase #40956 once this PR is merged. While working on #40956, I realized it made more sense to first add "generic" release candidate and release workflows before adding the MATLAB specific logic. Once this PR is merged, I'll go back to working on #40956. Sorry for the confusion!

sgilmore10 avatar Apr 12 '24 19:04 sgilmore10

Nice, great progress!

it made more sense to first add "generic" release candidate and release workflows

Oh yeah that makes sense, though I do think we should also add the official tarball + signature to the releases. Looking at the release scripts the only change to make would be to upload the tarball to svn before pushing the (rc) tag, which triggers this workflow where we could then pull from dist.a.o .

assignUser avatar Apr 12 '24 23:04 assignUser

I have opened a ticket with INFRA https://issues.apache.org/jira/browse/INFRA-25708

assignUser avatar Apr 12 '24 23:04 assignUser

before we merge this

We also have to wait for infra to review this.

I want to move the process that generates apache-arrow-X.Y.Z.tar.gz from 02-source.sh to .github/workflows/release_candidate.yml as a separated task.

Then let's also make the workflow sign it and adhere to the requirements listed here, which will also need a review by INFRA/security so we can get a 2 for 1 :)

assignUser avatar Apr 14 '24 00:04 assignUser

Sorry I accidentally closed the pull request - just re-opened it.

sgilmore10 avatar Apr 15 '24 15:04 sgilmore10

+1 but we should wait for a review from @raulcd before we merge this. (@raulcd may review this after 16.0.0 release.)

I want to move the process that generates apache-arrow-X.Y.Z.tar.gz from 02-source.sh to .github/workflows/release_candidate.yml as a separated task. Something like:

diff --git a/dev/release/02-source.sh b/dev/release/02-source.sh
index 27ca67a76e..874c1955f8 100755
--- a/dev/release/02-source.sh
+++ b/dev/release/02-source.sh
@@ -57,33 +57,12 @@ echo "Using commit $release_hash"
 
 tarball=apache-arrow-${version}.tar.gz
 
-rm -rf ${tag}
-# be conservative and use the release hash, even though git produces the same
-# archive (identical hashes) using the scm tag
-(cd "${SOURCE_TOP_DIR}" && \
-  git archive ${release_hash} --prefix ${tag}/) | \
-  tar xf -
-
-# Resolve all hard and symbolic links.
-# If we change this, we must change ArrowSources.archive in
-# dev/archery/archery/utils/source.py too.
-rm -rf ${tag}.tmp
-mv ${tag} ${tag}.tmp
-cp -R -L ${tag}.tmp ${tag}
-rm -rf ${tag}.tmp
-
-# Create a dummy .git/ directory to download the source files from GitHub with Source Link in C#.
-dummy_git=${tag}/csharp/dummy.git
-mkdir ${dummy_git}
-pushd ${dummy_git}
-echo ${release_hash} > HEAD
-echo '[remote "origin"] url = https://github.com/apache/arrow.git' >> config
-mkdir objects refs
-popd
-
-# Create new tarball from modified source directory
-tar czf ${tarball} ${tag}
-rm -rf ${tag}
+rm -f ${tarball}
+gh release download \
+  ${tag} \
+  --repo apache/arrow \
+  --dir . \
+  --pattern "${tarball}"
 
 if [ ${SOURCE_RAT} -gt 0 ]; then
   "${SOURCE_DIR}/run-rat.sh" ${tarball}

This makes sense to me. However, I think it will require re-ordering the scripts to ensure we push the RC tag after the crossbow job containing all build artifacts - specifically the MLTBX file - is finished. Otherwise, we won't be able to attach the MLTBX file - and potentially other build artifacts - as assets to the release candidate. What do you think?

sgilmore10 avatar Apr 15 '24 15:04 sgilmore10

+1 but we should wait for a review from @raulcd before we merge this. (@raulcd may review this after 16.0.0 release.) I want to move the process that generates apache-arrow-X.Y.Z.tar.gz from 02-source.sh to .github/workflows/release_candidate.yml as a separated task. Something like:

diff --git a/dev/release/02-source.sh b/dev/release/02-source.sh
index 27ca67a76e..874c1955f8 100755
--- a/dev/release/02-source.sh
+++ b/dev/release/02-source.sh
@@ -57,33 +57,12 @@ echo "Using commit $release_hash"
 
 tarball=apache-arrow-${version}.tar.gz
 
-rm -rf ${tag}
-# be conservative and use the release hash, even though git produces the same
-# archive (identical hashes) using the scm tag
-(cd "${SOURCE_TOP_DIR}" && \
-  git archive ${release_hash} --prefix ${tag}/) | \
-  tar xf -
-
-# Resolve all hard and symbolic links.
-# If we change this, we must change ArrowSources.archive in
-# dev/archery/archery/utils/source.py too.
-rm -rf ${tag}.tmp
-mv ${tag} ${tag}.tmp
-cp -R -L ${tag}.tmp ${tag}
-rm -rf ${tag}.tmp
-
-# Create a dummy .git/ directory to download the source files from GitHub with Source Link in C#.
-dummy_git=${tag}/csharp/dummy.git
-mkdir ${dummy_git}
-pushd ${dummy_git}
-echo ${release_hash} > HEAD
-echo '[remote "origin"] url = https://github.com/apache/arrow.git' >> config
-mkdir objects refs
-popd
-
-# Create new tarball from modified source directory
-tar czf ${tarball} ${tag}
-rm -rf ${tag}
+rm -f ${tarball}
+gh release download \
+  ${tag} \
+  --repo apache/arrow \
+  --dir . \
+  --pattern "${tarball}"
 
 if [ ${SOURCE_RAT} -gt 0 ]; then
   "${SOURCE_DIR}/run-rat.sh" ${tarball}

This makes sense to me. However, I think it will require re-ordering the scripts to ensure we push the RC tag after the crossbow job containing all build artifacts - specifically the MLTBX file - is finished. Otherwise, we won't be able to attach the MLTBX file - and potentially other build artifacts - as assets to the release candidate. What do you think?

On second thought, we could possibly write a "upload_release_candidate_assets" GitHub Actions that is triggered via gh workflow run and is only called after the crossbow jobs are finished. I imagine the new release workflow would look something like the following:

# Push the release candidate tag - will create the GitHub Release
git push -u apache apache-arrow-<version>-rc<rc-number>

...

# Submit binary tasks using crossbow, the command will output the crossbow build id
dev/release/03-binary-submit.sh <version> <rc-number>

# Wait for the crossbow jobs to finish
archery crossbow status <crossbow-build-id>

# Extracts build artifacts from crossbow job (using 04-binary-download) and adds them as assets to the Release Candidate in apache/arrow's GitHub Release area
gh workflow run upload_release_candidate_assets.yml -f version=<version> rc=<rc-number>

...

Thoughts?

sgilmore10 avatar Apr 15 '24 18:04 sgilmore10

Just an update:

After moving the release tarball generation to release_candidate.yml and using gh release download to download the tarball in 02-source.sh, the source release check (02-source-test.sh) failed because the GitHub Release from which 02-source.sh was trying to download the tarball did not exist.

To workaround this issue, I moved the tarball generation logic into a separate file called dev/release/util-create-release-tarball.sh, which release_candidate.yml calls to generate the release tarball. In 02-source.sh, if an environment variable TEST_RELEASE_SCRIPT is defined and greater than 0, then 02-source.sh invokes util-create-release-tarball.sh to generate the release tarball. Otherwise, it uses gh release download to retrieve the pre-built archive from the GitHub Releases area. Finally, I modified 02-source-test.rb to ensure TEST_RELEASE_SCRIPT is defined and set to 1 when this script invokes 02-source.sh.

sgilmore10 avatar Apr 16 '24 02:04 sgilmore10

Updates:

  1. Added dev/release/08-matlab-upload.sh as @kou suggested. This script (1) signs the MLTBX artifacts downloaded from the crossbow job and (2) uploads the artifacts to the release candidate GitHub Release's area.

  2. I noticed a small bug in matlab/tools/packageMatlabInterface.m. Specifically, the logic for extracting the version string only works on strings formatted like ${major}.${minor}.${patch}.dev${num}). I rewrote this logic to support extracting the version string from strings formatted like ${major}.${minor}.${patch}-rc${rc_num} as well.

  3. Modified the name of the MLTBX file generated by matlab/tools/packageMatlabInterface.m to not include the release candidate number. Instead, the MLTBX file name just includes the version - this is similar to how the name of the tarball created by dev/release/utils-create-release-tarball.sh does not include the release candidate number.

Next Steps:

  1. I still need to add the logic for signing the release tarball to the release_candidate.yml workflow. Once we hear back from INFRA regarding https://issues.apache.org/jira/browse/INFRA-25708, I'll submit a followup ticket requesting a signing key.

  2. Given that I added dev/release/08-matlab-upload.sh to this PR, I'll close #40956 as it is now superfluous.

Let me know if you have any questions or comments!

sgilmore10 avatar Apr 16 '24 16:04 sgilmore10

I haven't tried this but I think that it will work.

Hi @Kou, I just tested out the most recent changes - with a few minor tweaks - using the mathworks/arrow repo. Here's a link to mathworks/arrow's GitHub Releases area. I'll keep these releases around for a day or so just in case anyone wants to take a look.

sgilmore10 avatar Apr 17 '24 20:04 sgilmore10

Thanks! Let's wait for reviews from others. (We may need to ping them later.)

kou avatar Apr 18 '24 03:04 kou

I have updated the issue to let INFRA know that this is now ready for their review.

assignUser avatar Apr 25 '24 16:04 assignUser

@raulcd Could you review this?

kou avatar May 16 '24 22:05 kou

Ignore my previous commit! I'm going to revert. I meant to temporarily delete all workflows except for release.yml and release_candidate.yml on a mathworks/arrow branch so that I could test the updated release workflows without triggering the other ones on push! So sorry!

sgilmore10 avatar May 17 '24 18:05 sgilmore10

Should be fixed now. Sorry about that!

sgilmore10 avatar May 17 '24 19:05 sgilmore10

Just verified the updated versions of release_candidate.yml and release.yml work as expected as expected on mathworks/arrow.

  1. Link to the release_candidate.yml workflow run.
  2. Link to the release.yml workflow run.

sgilmore10 avatar May 17 '24 19:05 sgilmore10

Hi @assignUser,

I just wanted get your thoughts about moving forward with this PR after the jira ticket is resolved.

My understanding was that we can't merge this PR until we add the automated signing logic to the release_candidate.yml workflow. My original thought process was that we would need to wait to do this work until INFRA-25708 is resolved. However, I'm wondering if it makes sense to start that work now so that we can merge this PR in time for the next release. If I do add the automated signing logic, I could either make a separate jira ticket or add a comment to the existing one.

What do you think?

Best, Sarah

sgilmore10 avatar May 29 '24 14:05 sgilmore10

However, I'm wondering if it makes sense to start that work now so that we can merge this PR in time for the next release.

Sounds like a great idea! I looked for a reference issue in JIRA that documents the process: https://issues.apache.org/jira/browse/INFRA-25610

Hm I just noticed that "All artifacts being signed can be built reproducibly" is a requirement for automated signing. I don't know that we fulfill that yet (leaning towards no?). I'll have to look into it and see how hard this would be to implement. Though afaik we only sign the tarball, so that shouldn't be too hard to make work? Getting it to work with all binary artifacts would probably be a different story.

assignUser avatar May 29 '24 15:05 assignUser

After looking into it we do sign a number of additional things (e.g. java, python and C# binaries...), so the reproducible build requirement might block this for a bit, sorry. I will look into what is actually required and how we can implement it.

Appreciate all your work on this @sgilmore10 !!

assignUser avatar May 29 '24 15:05 assignUser

Thanks for looking into this @assignUser! I appreciate all the help!

In that case, I'll hold off on working on that step. Maybe we can do that in a separate PR once we can meet the "reproducible build" requirement?

Thanks again!

sgilmore10 avatar May 29 '24 15:05 sgilmore10

Hi @assignUser,

Given that the reproducible builds requirement may block adding automated release signing for a bit, how do you feel about adding that functionality in a separate PR? If we go that route, I think I need to update dev/release/02-source.sh to sign the downloaded tarball and then upload the signed tarball to the GitHub Releases area.

How does that sound?

Best, Sarah

sgilmore10 avatar Jun 04 '24 19:06 sgilmore10

Given that the reproducible builds requirement may block adding automated release signing for a bit, how do you feel about adding that functionality in a separate PR?

+1

If we go that route, I think I need to update dev/release/02-source.sh to sign the downloaded tarball and then upload the signed tarball to the GitHub Releases area.

We don't need to do it. We don't need to work on automated release signing for both of the source archive and the MATLAB binary at once.

@raulcd Do you have any more comments? Can we merge this?

kou avatar Jun 04 '24 21:06 kou

Hi @kou,

I think we still need to wait on INFRA to approve INFRA-25708 before we can merge this PR.

To clarify, do you think we could ask INFRA for a signing key for just the source tarballs? I was under the impression INFRA would not give out a signing key to use only for signing the source archive, but I may be mistaken.

sgilmore10 avatar Jun 04 '24 23:06 sgilmore10

I was under the impression INFRA would not give out a signing key to use only for signing the source archive, but I may be mistaken.

That's also my understanding (nothing would prevent us to sign non-reproducible binaries with the 'just for source'-key).

assignUser avatar Jun 05 '24 02:06 assignUser

I think we still need to wait on INFRA to approve INFRA-25708 before we can merge this PR.

I don't think that we need an approval from INFRA for this PR because this PR doesn't use automated signing. We sill sign manually in dev/release/07-matlab-upload.sh. It's same as the current our release process for our artifacts including the source archive, .deb/.rpm and so on.

To clarify, do you think we could ask INFRA for a signing key for just the source tarballs?

I think so. I think that we can expand automated signing targets step by step. For example, we enable automated signing only for the source archive and then enable it for .deb too and so on.

I was under the impression INFRA would not give out a signing key to use only for signing the source archive, but I may be mistaken.

Oh, I hadn't thought of that.

kou avatar Jun 05 '24 02:06 kou