skip-duplicate-actions
skip-duplicate-actions copied to clipboard
Action on pull_request trigger is skipped even though base branch content changed
I have the following workflow defined, which enables concurrent skipping and then simply echoes whether a run has been skipped or not.
name: CI
on:
push:
pull_request:
jobs:
find-duplicate-workflows:
runs-on: [ self-hosted ]
steps:
- id: skip-check
uses: fkirc/[email protected]
with:
concurrent_skipping: "same_content_newer"
skip_after_successful_duplicate: "true"
- run: echo ${{ steps.skip-check.outputs.should_skip }}
Now I add a new file that only exists in the main
branch, and an additional file that only exists in a pr
branch. Now I create a pull request to merge pr
into main
. The push
triggers for both branches are ran as expected.
The pull_request
triggers must not be skipped however, since the merge will contain both files, so the checked out contents differ from both main
and pr
. skip-duplicate-actions however reports should-skip = true
.
See our test repo for a reproduction. Runs 6 and 11 are from push events, and 12 is from a pull_request
trigger concurrent with its sibling push
.
Thank you for the detailed report, @fknorr!
As you can see in the logs of the pull_request
workflow run (#12), skip-duplicate-actions
has detected that the exact same files (same tree hash) have already been checked in the push
workflow run (#11) and therefore reports should-skip = true
. Apparently, the push
workflow run was performed a little before the pull_request
workflow run.
Please note that both workflow runs, pull_request
and push
, are performed against the same ref (the merge commit).
So, this is working as intended.
However, once the pull request (https://github.com/celerity/test-skip-duplicate-actions/pull/2) is merged into the main
branch, skip-duplicate-actions
should report should-skip = false
because of the added commit in the main
branch.
Hope this helps!
Thanks for taking a look at this @paescuj. I have updated the test setup to report in detail which revisions are checked out and what files are present (also I have deleted all old runs so there's no unexpected interactions).
The actions script now runs
git rev-parse HEAD
ls
echo ${{should_skip}}
- push trigger #19 runs for the main branch and reports
fb314dc3155556864a8d5d3fc063e0372cd0b582
file-on-main.txt
should_skip=false
- push trigger #20 runs for the un-merged PR branch and reports
59164db4e10d9f17a313bd5b98e0a5c3e19c42e0
file-in-pr.txt
should_skip=false
- pull_request trigger #21 runs for the implicit PR merge commit and reports
bd633cec460a9c1af3247d70fca45942903b2baa
file-in-pr.txt
file-on-main.txt
should_skip=true
The skip-duplicate-actions
logs state that this run can be skipped since the same files have been checked in # 20, but that's not actually true. From my understanding push triggers always run on the unmerged branch, and pull_request triggers check GitHub's implicit merge commit.
From my understanding push triggers always run on the unmerged branch, and pull_request triggers check GitHub's implicit merge commit.
Of course, you're totally right! I must have been a bit tired and got things mixed up... 😄🙈
Indeed, the behavior of skip-duplicate-actions
seems to be slightly wrong in such scenarios. Thank you very much for recognizing this! I'll come up with a solution in the next days and report back here.
Unfortunately, I don't see any way to get the tree SHA from the merge commit for all workflow runs that were triggered by the pull request event. (See #270 for an unsuccessful attempt) So as a conclusion:
- [ ] When I do find the time, I'll demand GitHub to include this information in the Workflow runs API.
- [ ] Document that the current checks of
skip-duplicate-actions
are slightly wrong onpull_request
events and when there are changes in the base branch.
Thanks for giving this a try!
I have looked into what actions/checkout does on pull requests, since it also needs to figure out the merge SHA on a pull_request
trigger. Its Readme talks about $GITHUB_SHA
and $GITHUB_REF
, and indeed the workflow documentation suggests that this information is passed to jobs in environment variables for any event type.
From this it seems like the Action should not rely on the API at all for figuring out what commit to run on but do a getenv
instead.
Edit: Oh well, this only helps to get the commit SHA in the current job of course, not the "remote ones" (at least not without some log/communication hackery).
(at least not without some log/communication hackery).
One such hack could be to upload the SHA as a workflow artifact. Ugly but might just work..?
Edit: Oh well, this only helps to get the commit SHA in the current job of course, not the "remote ones" (at least not without some log/communication hackery).
Exactly 😄
One such hack could be to upload the SHA as a workflow artifact. Ugly but might just work..?
Yeah, I was thinking about something like this. One "problem" with GitHub Actions is that you always need to keep an eye on the amount of API requests you're doing, otherwise you risk to run into the rate limit.
Possibly, we could upload a map with the SHAs from the last 100 runs in each workflow run and then just fetch it from the last run, respectively.
Another concern is the execution time of skip-duplicate-actions
which I'd like to keep as low as possible.
I might give this a try when I find some time...
A conservative solution could be to identify the cases we definitely know are duplicates, and keep the jobs for which we cannot be sure.
- In the current run, we always know the true "checkout SHA" from the
$GITHUB_SHA
. - For API queries on other concurrent / finished jobs, we only consider those that were not triggered on
pull_request
, because we know that for non-pull_request
s the "API SHA" and "checkout SHA" will match.
That means we can actively skip a pull_request
trigger if we find a matching push
trigger, but not the other way round. I believe the remaining triggers would also behave like push
es in that sense.