`cancelled()` returns `false` after workflow cancel
Describe the bug
I have a branch protection rule that requires a pr-validation job to be successful for merging a PR:
pr-validation:
needs: [test-linux-x86_64, test-linux-aarch64, test-macos, coverage, sanitize]
runs-on: ubuntu-latest
if: ${{ !cancelled() }}
steps:
- if: contains(needs.*.result, 'failure')
run: exit 1
Some of the jobs in the needs list are complex and use matrixes to trigger reusable workflows, some of them also use matrixes and trigger more reusable workflows.
As you can see here, this job was running successfully when:
Expected behavior
- The workflow was canceled, so it wasn't expected to run.
- Some of the jobs in the
needslist failed before the workflow cancelation, so if the job was to run, theexit 1step was expected to be executed.
But for some reason, the job ran and ended successfully, and approved the PR.
Runner Version and Platform
Current runner version: '2.311.0'
Operating System
Ubuntu
22.04.3
LTS
Runner Image
Image: ubuntu-22.04
Version: 20231211.1.0
Included Software: https://github.com/actions/runner-images/blob/ubuntu22/20231211.1/images/ubuntu/Ubuntu2204-Readme.md
Image Release: https://github.com/actions/runner-images/releases/tag/ubuntu22%2F20231211.1
Runner Image Provisioner
2.0.321.1
GITHUB_TOKEN Permissions
OS of the machine running the runner? OSX/Windows/Linux/...
What's not working?
Please include error messages and screenshots.
Job Log Output
If applicable, include the relevant part of the job / step log output here. All sensitive information should already be masked out, but please double-check before pasting here.
Runner and Worker's Diagnostic Logs
If applicable, add relevant diagnostic log information. Logs are located in the runner's _diag folder. The runner logs are prefixed with Runner_ and the worker logs are prefixed with Worker_. Each job run correlates to a worker log. All sensitive information should already be masked out, but please double-check before pasting here.
Debug run output of the exit 1 step:
##[debug]Evaluating condition for step: 'Run exit 1'
##[debug]Evaluating: (success() && contains(needs[*].result, 'failure'))
##[debug]Evaluating And:
##[debug]..Evaluating success:
##[debug]..=> true
##[debug]..Evaluating contains:
##[debug]....Evaluating Index:
##[debug]......Evaluating Index:
##[debug]........Evaluating needs:
##[debug]........=> Object
##[debug]........Evaluating Wildcard:
##[debug]........=> '*'
##[debug]......=> Array
##[debug]......Evaluating String:
##[debug]......=> 'result'
##[debug]....=> Array
##[debug]....Evaluating String:
##[debug]....=> 'failure'
##[debug]..=> false
##[debug]=> false
##[debug]Expanded: (true && contains(Array, 'failure'))
##[debug]Result: false
Just happened again, and I think I figured out why:
- The external/first matrix uses workflow calls with the default fail-fast. When one of the entries fails, it cancels the other matrix jobs and marks the matrix result as
cancelled. - When I tested a similar scenario, but the matrix has its own steps (and not calling a reusable workflow), the matrix result was
failure.
I couldn't find any mention of this inconsistent behavior.
Hey @GuyAv46, here is an example which cancels the final job if some of previous jobs had failed.
name: test_require_status_check
on:
workflow_dispatch:
pull_request:
push:
jobs:
pre-requisite:
runs-on: ubuntu-latest
steps:
- name: Get Req
id: get-req
run: echo "get req"
build-one:
name: build one
runs-on: ubuntu-latest
needs: pre-requisite
steps:
- run: |
echo "build-one"
exit 1
build-two:
name: build two
runs-on: ubuntu-latest
needs: pre-requisite
steps:
- run: echo "build-two"
test:
name: Test initial
needs:
- pre-requisite
- build-two
runs-on: ubuntu-latest
steps:
- run: |
echo "Testing builds"
exit 1
test-more:
name: Test more
needs:
- pre-requisite
- build-two
runs-on: ubuntu-latest
steps:
- run: echo "Test more"
results:
runs-on: ubuntu-latest
if: ${{ always() && contains(needs.*.result, 'failure') == false}}
needs:
- build-one
- build-two
- test
- test-more
steps:
- run: echo "Workflows have succeeded!"
And below you can see diagram image
The problem is that if I only skip the task, it's enough for the required tasks for the branch protection rules, and the PR is approved. I had to force this job to run and fail if there was a fail/cancel.
My solution was to add "cancelled" to the expected output for failure:
pr-validation:
needs: [test-linux-x86_64, test-linux-aarch64, test-macos, coverage, sanitize]
runs-on: ubuntu-latest
if: ${{ !cancelled() }}
steps:
- if: contains(needs.*.result, 'failure') || contains(needs.*.result, 'cancelled')
run: exit 1
I'm seeing this behaviour as well when using any of the various of if cancelled()I've seen. Here's one of the test steps I have in my workflow:
echo-test:
runs-on: ubuntu-latest
if: ${{ cancelled() }}
run: echo "Job was cancelled!"
I've tried using just if: cancelled() along with if: ${{ always() }} and - if: ${{ needs.*.result == 'cancelled' }} . I was going to raise a separate issue but it seems like we're seeing the same behaviour, so probably best not to duplicate work for the team.
After I thought I found the workaround I needed, it happened again.
This time my pr-validation checks for both failure and cancelled, but somehow the result was neither.
pr-validation:
needs:
- docs-only # if the setup jobs fail, the rest of the jobs will be skipped, and we will exit with a failure
- get-latest-redis-tag
- get-latest-7-0-redis-tag
- get-latest-6-2-redis-tag
- get-latest-6-0-redis-tag
- test-linux-x86_64
- test-linux-aarch64
- test-macos
- coverage
- sanitize
runs-on: ubuntu-latest
if: ${{ !cancelled() }}
steps:
- if: contains(needs.*.result, 'failure') || contains(needs.*.result, 'cancelled')
run: exit 1
Debug logging is pretty useless if you can't see the content of the Array. I tried to reproduce it but was only able to achieve either 'failure' or 'cancelled'
I figured out what is the problem.
As I mentioned before, if a matrix is used with a reusable workflow, and one of the matrix runs fails, the result of the matrix is CANCELLED. I tried to work around this, but...
If you have a workflow containing a matrix that calls a reusable workflow, and in this workflow, there is another matrix that uses another reusable workflow, if one of the deepest workflows fails, its parent matrix will report cancelled, which will cause the grandparent matrix TO IGNORE THE RETURN STATUS AND REPORT SUCCESS. This means that I must flatten out my matrix so a cancelled status won't be ignored. I can't have a fail-fast in this case, but at least my CI won't pass on a failing flow.
I hope this will get some attention soon
Using ARR variable to get the results of the list:
pr-validation:
needs:
- docs-only # if the setup jobs fail, the rest of the jobs will be skipped, and we will exit with a failure
- get-latest-redis-tag
- get-latest-7-0-redis-tag
- get-latest-6-2-redis-tag
- get-latest-6-0-redis-tag
- test-linux-x86_64
- test-macos
- coverage
- sanitize
runs-on: ubuntu-latest
if: ${{ !cancelled() }}
env:
ARR: ${{ toJson(needs.*.result) }}
steps:
- if: contains(needs.*.result, 'failure') || contains(needs.*.result, 'cancelled')
run: exit 1
- run: exit 1
I came across this issue as well. From what I can tell, it's specific to having a matrix that calls a reusable workflow. When one job of the matrix fails, and others fail fast, it reports a cancelled status but the cancelled() status check function doesn't return true.
I have a simple reproduction in this repo with runs showing the behavior, and also started this discussion post yesterday.
In the simple case of a single matrix (I don't have any matrix calling another matrix), using if: ${{ !contains(needs.*.result, 'cancelled') }} instead of if: ${{ !cancelled() }} is good enough.
This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 15 days.
This issue was closed because it has been stalled for 15 days with no activity.