checkout icon indicating copy to clipboard operation
checkout copied to clipboard

What does "Checkout pull request HEAD commit instead of merge commit" mean?

Open sunshowers opened this issue 5 years ago â€ĸ 17 comments

Edit 2023-10-23: I no longer believe in what I wrote below -- I've seen data which has changed my belief into checking out the merge commit is the right default. I'll leave this open since there's a lot of passion here, and maybe this deserves clearer documentation.

Hi there!

The README has a section saying how to "[c]heckout pull request HEAD commit instead of merge commit". It isn't really clear what this means, and I tried looking at the source code and it didn't really make sense to me either.

Could you add some more information about this to the README? What is a "merge commit" in this situation?

Some information that might be useful:

  • a table with the default behavior and with the behavior with ref set to ${{ github.event.pull_request.head.sha }}
  • an example commit DAG pointing at the result of this action with and without this option

My experience with CI has led me to believe that there's just one way to check out a pull request (the most recent commit on the branch at the time the PR was initiated or last updated) so when I came across this option I was really surprised.

sunshowers avatar Jan 16 '21 00:01 sunshowers

The docs below may help, let me know

https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/merging-a-pull-request

https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-request-merges

By default a merge commit is checked out. If you look at the SHA that is checked out, it doesn't match your local commit SHA. Rather it's the SHA of a merge commit (your branch + base branch)

ericsciple avatar Jan 21 '21 02:01 ericsciple

Thanks for the response!

By default a merge commit is checked out. If you look at the SHA that is checked out, it doesn't match your local commit SHA. Rather it's the SHA of a merge commit (your branch + base branch)

  1. What if the repository is not configured to have merges? The codebases I manage all maintain a linear history, and either use github's rebase UI or a land queue service.
  2. What if there are merge conflicts between the base branch and the PR? What commit gets checked out then?
  3. I will probably end up always using the head commit, but more broadly, in my personal opinion, this is not the right default. I believe the code that should be tested against is exactly what the PR author put up. What evidence and/or arguments would it take to convince a switch away from the current default?
  4. I think the answers to the above should be documented in the readme. This is genuinely surprising behavior to me and I've never experienced a CI system which does this.

sunshowers avatar Jan 21 '21 02:01 sunshowers

Would there be any interest in revisiting this decision at all?

sunshowers avatar Feb 17 '21 23:02 sunshowers

To further clarify: the merge commit is checked out for pull_request events. This action does checkout the head of a branch for push events, so that option is available for workflows without needing to configure the checkout action. This is the primary reason that I generally prefer running my builds on both push and pull_request events; a failure on only one of the builds provides valuable information in determining the cause of failure.

in my personal opinion, this is not the right default.

The default action for merging of a Pull Request is a merge commit, so (IMO) I think it is the appropriate default for the checkout action to run against what the pr "merge" button will generate (by default).

For the historical github flow (before squash and rebase merges were available on github), it would be more surprising (IMO) to have the HEAD run CI, since that doesn't provide any safety or guarantees which are the purpose of the CI jobs. Imagine a green CI check that fails immediately upon merge; that's a much more dangerous scenario since it actively promoting false negatives.

This is genuinely surprising behavior to me

Agreed.

and I've never experienced a CI system which does this.

FWIW, this is also the default behavior for travisci. (actually, I believe the default for travis is to run two builds: one for push and one for pr; but the pr build is built against the merge commit)

jasonkarns avatar Mar 03 '21 19:03 jasonkarns

I just spent 4 hours scratching my head trying to understand why I got different sha for HEAD and HEAD^ when running locally and when running in a github workflow. This explains it. Using git ls-remote was also helpful here.

I really wish this information would be more visible and the implications of it (especially for non-experts in git/github). I naively thought that running git --quiet diff HEAD~1 -- db/ would give the same result locally and when running in a pull request workflow. I wanted to compare the current and last commit. It worked locally, and in the PR user interface I can see my commit hashes. Now I understand why it didn't work (because of the hypothetical merge commit approach) but yeah, it took some time to figure out...

einarpersson avatar Mar 27 '23 15:03 einarpersson

Maybe something here about why you would like to do this (and explain why this isn't done by default). I understand that this might be "common knowledge" to some, but for me this was the first time I encountered this

image

I would like a warning sign âš ī¸ 😄 or â„šī¸

einarpersson avatar Mar 27 '23 15:03 einarpersson

  1. What if there are merge conflicts between the base branch and the PR? What commit gets checked out then?

Just want to highlight this is mentioned in the docs now:

Note: Workflows will not run on pull_request activity if the pull request has a merge conflict. The merge conflict must be resolved first.

ctcampbell avatar May 18 '23 09:05 ctcampbell

For the historical github flow (before squash and rebase merges were available on github), it would be more surprising (IMO) to have the HEAD run CI, since that doesn't provide any safety or guarantees which are the purpose of the CI jobs. Imagine a green CI check that fails immediately upon merge; that's a much more dangerous scenario since it actively promoting false negatives.

So that's actually exactly what I just bumped into with the current behaviour.

Our repository has submodules and we have some bots that tell us when these need to be updated. In our main branch these have been moving forward so the main branch is a couple of versions ahead on some of these submodules.

In the mean time, we're preparing a major release on a separate version 4.0.0 branch. We've got lots of smaller PRs being merged into that v4 branch. For one of those PRs, the tests were all green and so we merged it into the v4 branch. As soon as we did that, things failed in the v4 branch (same code and tests that just passed in the PR).

The only difference between the two runs in CI was the merge target... first time round the merge target was the v4 branch (which didn't have any of the submodule updates applied). Second time round, the merge target was our main branch (which did have submodule updates applied)... so different results.

Specifically (and this is maybe too much detail) in our case the submodules are stuff running in other languages. As part of our build process, we generate some wrappers to do marshalling etc. so that we can make use of native code in those modules and we check to make sure that that generated code is the same, after build, as what was checked into source control. If any of the files are dirty, CI fails as it knows something funky has happened. If we instead checked out the PR head, I think the only risk would be if some code in the PR relied on generated/marshalling code that had undergone a breaking change - but that would imply a major version bump for one of the submodules we reply on...

So at least in our specific case, it seems like we'd be better off using ref: ${{ github.event.pull_request.head.sha }}... I'm not sure if there are couter-examples where that would result in merges that fail immediately.

jamescrosswell avatar Oct 23 '23 23:10 jamescrosswell

FWIW, since my earlier post I've reversed my opinion -- I've seen proprietary data which has led me to believe that checking out the merge commit, while not as pure as always checking out the PR head, results in better outcomes overall.

sunshowers avatar Oct 23 '23 23:10 sunshowers

@sunshowers I'd be really interested to hear specific examples of how/where ref: ${{ github.event.pull_request.head.sha }} could get us in trouble... if you've got any.

jamescrosswell avatar Oct 24 '23 01:10 jamescrosswell

It's not a problem per se, it's just that the data showed that using the merge conmit provided more relevant CI results.

sunshowers avatar Oct 24 '23 02:10 sunshowers

@sunshowers I'd be really interested to hear specific examples of how/where ref: ${{ github.event.pull_request.head.sha }} could get us in trouble... if you've got any.

of course it can, for example in our CI we must run tests on content of feature branch of the PR merged with content of the dev trunk. if we run on feature branch content only, it will pass, but the moment it gets merged to trunk and code is mixed among them, it'll fail. I am actually wondering how running a CI on incoming branch only, can be useful as a quality gate for merge (i mean, after all we're usually validating that after the merge to trunk or some 'main', the code doesn't break the 'main' right? the tests run and all that. but if we don't run it on a merge commit where code is joined of both source and destination, those tests are invalid and won't protect the main branch from breaking imho). I hope I understood the discussion correctly =) .

Dmitry1987 avatar Jan 08 '24 16:01 Dmitry1987

but anyways, I came here while breaking my head searching for a solution of how to checkout the merge commit from pre-existing repo that I am caching in the ec2 AMI in order to not checkout many GBs of LFS and 180k+ files monorepo each time (so not doing a fresh checkout of PR during workflow, but manual 'git checkout' in cli, to the $GITHUB_SHA commit, which is the merge commit in pull_request event actions run). I don't understand why GITHUB_SHA leads to "not in tree" (doesn't exist) even if I fetch, and do 'pull --all' and anything else I was able to find online, nothing works like that commit is nonexistent. but if it's not, then why env variable is populated with such SHA? it should be the merge commit but I cannot check it out :( , does anyone have a suggestion? can I use this action in order to 'pull' merge commit inside existing folder? I tried to point 'path' to existing cached git repo but the action deletes it and re-clones fresh from scratch, but I need to keep existing files (at least the submodule with giant amount of LFS data) but it deletes everything when I use 'path'.

Dmitry1987 avatar Jan 08 '24 16:01 Dmitry1987

hahaaa i figured this out 🎉

          if [ "${GITHUB_EVENT_NAME}" == "pull_request" ]; then
            PR_NUMBER="${GITHUB_REF_NAME%%/*}"
            git fetch --no-tags --depth=1 origin +${GITHUB_REF}:refs/remotes/origin/pr/${PR_NUMBER}
            git checkout origin/pr/$PR_NUMBER
          else 
            git fetch --no-tags --depth=1 origin ${GITHUB_REF}
            git checkout $GITHUB_SHA
          fi

the fetch step with correct format of +${GITHUB_REF}:refs/remotes/origin/pr/${PR_NUMBER} was the key 💃

Dmitry1987 avatar Jan 08 '24 20:01 Dmitry1987

a "fresh" green CI check cannot fail immediately upon merge

This is a powerful reason for the current behaviour,

Nonetheless, it is important to bear in mind that there is a substantial distinction between a "green" checkmark in the most recent pull request and one from a week ago. Ideally, I hope that GitHub could integrate this awareness into the icon to clearly convey this difference.

vlad-obrizum avatar Jan 18 '24 10:01 vlad-obrizum

@Dmitry1987 but if we don't run it on a merge commit where code is joined of both source and destination, those tests are invalid and won't protect the main branch from breaking imho.

What if base branch is updated after pull_request workflow checks out merge commit and tests are running, PR workflow will succeed but I think these tests will also be invalid.

I don't see options other then to also run tests on push event on the base branch.

tornike avatar Mar 06 '24 15:03 tornike