rushstack
rushstack copied to clipboard
[rush] Change getChangedFiles to pass ref only in git diff
This is to avoid git determining merge base of given ref and current head
Summary
Related to #4589 but for rush change --verify --no-fetch --target-branch <ref>
In CI, I have the SHA of merge base given by the CI runner at env variable ($CI_MERGE_REQUEST_DIFF_BASE_SHA
), so I run the following command.
$ rush change --verify --no-fetch --target-branch $CI_MERGE_REQUEST_DIFF_BASE_SHA
However that will fail with the following output:
Found configuration in /REDACTED_PATH_TO_REPO/rush.json
Rush Multi-Project Build Tool 5.122.1 - https://rushjs.io/
Node.js version is 18.20.2 (LTS)
Starting "rush change"
The target branch is <<REDACTED_MERGE_BASE_SHA>>
ERROR: The command failed with exit code 128
fatal: <<REDACTED_MERGE_BASE_SHA>>...HEAD: no merge base
Details
I figured out that the failing codes was here in the getChangedFiles function:
https://github.com/microsoft/rushstack/blob/49d93a9cdc5f844fb37f58ae83cbbae21586a75f/libraries/rush-lib/src/logic/Git.ts#L277-L294
From my understanding from reading git diff
docs and this StackOverflow question, to retrieve a list of changed files between the given ref (branch/commit/tag/etc) and current HEAD, I do not need to retrieve the merge base (and hence do not need to be in the style of A...B
which Git will determine the merge base)
From my investigation of Rush's history, it seemed like it was added this way since 2016, so I'm not sure what's the intention there:
https://github.com/microsoft/rushstack/commit/4a4de9a2ecc246854b96b5d3e1cbc5377b1e6547#diff-26c82d4767d5297bb1f9e2d6c4329b25a0f60cd6b99c5d8d59c4ac7e14a04320R22
How it was tested
Manual test: I build Rush locally with this change and able to get that command (rush change --verify --no-fetch --target-branch <<REDACTED_MERGE_BASE>>
) working on my local repro.
Impacted documentation
The most common scenario for rush change
is rush change --target-branch main
, which is commonly used in pre-push
hooks to enforce changefiles in a repository. If the API doesn't look up the merge base as part of the call, all commits between the merge-base of HEAD
and origin/main
and the current tip of origin/main
will erroneously be reflected as part of the current set of changed files.
In a typical CI environment, there will be a merge commit at HEAD
, and the diff to calculate is simply the diff between HEAD
and HEAD~1
, which will be the merge base of HEAD
and the target branch.
Most likely the no merge base
error indicates that your CI build is performing a shallow fetch with a depth of 1; a depth of 2 should successfully find the merge base at HEAD~1
.
rush change
needs the merge base to correctly determine which projects were changed.
Is there a reason you can't clone with a depth of two?
If the API doesn't look up the merge base as part of the call, all commits between the merge-base of HEAD and origin/main and the current tip of origin/main will erroneously be reflected as part of the current set of changed files.
@dmichon-msft I see... that makes sense
Most likely the no merge base error indicates that your CI build is performing a shallow fetch with a depth of 1; a depth of 2 should successfully find the merge base at HEAD~1.
Is there a reason you can't clone with a depth of two?
@dmichon-msft @iclanton Yes I'm doing a shallow clone of depth 1 in CI build. It is not that I couldn't do a clone with depth of two, it is just that 2 is an arbitrary number... In a long lived branch, it can be 5, 10, or even 100 until it finds the "merge base". So I'm trying hard to avoid that, since the repo that I worked on has a huge commit history (like 100,000+ commits)
In the case of my CI build, since I already have the commit hash of the merge base (passed by the CI executor), it would be a waste to clone the repo with more history just for the tooling to determine the merge base again.
Is there a possibility of something like: if the given "target-branch" value is a commit hash, use a different getChangedFiles that calls git diff <ref>
instead of git diff <ref>...
?
@dmichon-msft @iclanton Yes I'm doing a shallow clone of depth 1 in CI build. It is not that I couldn't do a clone with depth of two, it is just that 2 is an arbitrary number... In a long lived branch, it can be 5, 10, or even 100 until it finds the "merge base". So I'm trying hard to avoid that, since the repo that I worked on has a huge commit history (like 100,000+ commits)
The value of 2 is based on the CI mechanism used by both Azure DevOps and GitHub for PR builds, that they create a merge commit parented on the tip of the target branch and run the pipeline against that, therefore the first parent of the current commit is always the tip of the target branch as of the time the build was queued. Creating that merge commit is a prerequisite for queueing PR builds, since if it can't be created, the source branch can't be successfully merged into the target branch.
I see, thanks for the explanation