rushstack icon indicating copy to clipboard operation
rushstack copied to clipboard

[rush] Change getChangedFiles to pass ref only in git diff

Open kenrick95 opened this issue 9 months ago • 4 comments

This is to avoid git determining merge base of given ref and current head

Summary

Related to #4589 but for rush change --verify --no-fetch --target-branch <ref>

In CI, I have the SHA of merge base given by the CI runner at env variable ($CI_MERGE_REQUEST_DIFF_BASE_SHA), so I run the following command.

$ rush change --verify --no-fetch --target-branch $CI_MERGE_REQUEST_DIFF_BASE_SHA

However that will fail with the following output:

Found configuration in /REDACTED_PATH_TO_REPO/rush.json
Rush Multi-Project Build Tool 5.122.1 - https://rushjs.io/
Node.js version is 18.20.2 (LTS)
Starting "rush change"
The target branch is <<REDACTED_MERGE_BASE_SHA>>
ERROR: The command failed with exit code 128
fatal: <<REDACTED_MERGE_BASE_SHA>>...HEAD: no merge base

Details

I figured out that the failing codes was here in the getChangedFiles function:

https://github.com/microsoft/rushstack/blob/49d93a9cdc5f844fb37f58ae83cbbae21586a75f/libraries/rush-lib/src/logic/Git.ts#L277-L294

From my understanding from reading git diff docs and this StackOverflow question, to retrieve a list of changed files between the given ref (branch/commit/tag/etc) and current HEAD, I do not need to retrieve the merge base (and hence do not need to be in the style of A...B which Git will determine the merge base)

From my investigation of Rush's history, it seemed like it was added this way since 2016, so I'm not sure what's the intention there:

https://github.com/microsoft/rushstack/commit/4a4de9a2ecc246854b96b5d3e1cbc5377b1e6547#diff-26c82d4767d5297bb1f9e2d6c4329b25a0f60cd6b99c5d8d59c4ac7e14a04320R22

How it was tested

Manual test: I build Rush locally with this change and able to get that command (rush change --verify --no-fetch --target-branch <<REDACTED_MERGE_BASE>>) working on my local repro.

Impacted documentation

kenrick95 avatar May 06 '24 08:05 kenrick95

The most common scenario for rush change is rush change --target-branch main, which is commonly used in pre-push hooks to enforce changefiles in a repository. If the API doesn't look up the merge base as part of the call, all commits between the merge-base of HEAD and origin/main and the current tip of origin/main will erroneously be reflected as part of the current set of changed files.

In a typical CI environment, there will be a merge commit at HEAD, and the diff to calculate is simply the diff between HEAD and HEAD~1, which will be the merge base of HEAD and the target branch.

Most likely the no merge base error indicates that your CI build is performing a shallow fetch with a depth of 1; a depth of 2 should successfully find the merge base at HEAD~1.

dmichon-msft avatar May 06 '24 16:05 dmichon-msft

rush change needs the merge base to correctly determine which projects were changed.

Is there a reason you can't clone with a depth of two?

iclanton avatar May 06 '24 18:05 iclanton

If the API doesn't look up the merge base as part of the call, all commits between the merge-base of HEAD and origin/main and the current tip of origin/main will erroneously be reflected as part of the current set of changed files.

@dmichon-msft I see... that makes sense

Most likely the no merge base error indicates that your CI build is performing a shallow fetch with a depth of 1; a depth of 2 should successfully find the merge base at HEAD~1.

Is there a reason you can't clone with a depth of two?

@dmichon-msft @iclanton Yes I'm doing a shallow clone of depth 1 in CI build. It is not that I couldn't do a clone with depth of two, it is just that 2 is an arbitrary number... In a long lived branch, it can be 5, 10, or even 100 until it finds the "merge base". So I'm trying hard to avoid that, since the repo that I worked on has a huge commit history (like 100,000+ commits)

In the case of my CI build, since I already have the commit hash of the merge base (passed by the CI executor), it would be a waste to clone the repo with more history just for the tooling to determine the merge base again.

Is there a possibility of something like: if the given "target-branch" value is a commit hash, use a different getChangedFiles that calls git diff <ref> instead of git diff <ref>...?

kenrick95 avatar May 07 '24 01:05 kenrick95

@dmichon-msft @iclanton Yes I'm doing a shallow clone of depth 1 in CI build. It is not that I couldn't do a clone with depth of two, it is just that 2 is an arbitrary number... In a long lived branch, it can be 5, 10, or even 100 until it finds the "merge base". So I'm trying hard to avoid that, since the repo that I worked on has a huge commit history (like 100,000+ commits)

The value of 2 is based on the CI mechanism used by both Azure DevOps and GitHub for PR builds, that they create a merge commit parented on the tip of the target branch and run the pipeline against that, therefore the first parent of the current commit is always the tip of the target branch as of the time the build was queued. Creating that merge commit is a prerequisite for queueing PR builds, since if it can't be created, the source branch can't be successfully merged into the target branch.

dmichon-msft avatar May 07 '24 16:05 dmichon-msft

I see, thanks for the explanation

kenrick95 avatar May 14 '24 08:05 kenrick95