runner `cancelled()` returns `false` after workflow cancel

Describe the bug

I have a branch protection rule that requires a pr-validation job to be successful for merging a PR:

  pr-validation:
    needs: [test-linux-x86_64, test-linux-aarch64, test-macos, coverage, sanitize]
    runs-on: ubuntu-latest
    if: ${{ !cancelled() }}
    steps:
      - if: contains(needs.*.result, 'failure')
        run: exit 1

Some of the jobs in the needs list are complex and use matrixes to trigger reusable workflows, some of them also use matrixes and trigger more reusable workflows.

As you can see here, this job was running successfully when:

Expected behavior

The workflow was canceled, so it wasn't expected to run.
Some of the jobs in the needs list failed before the workflow cancelation, so if the job was to run, the exit 1 step was expected to be executed.

But for some reason, the job ran and ended successfully, and approved the PR.

Runner Version and Platform

Current runner version: '2.311.0'
Operating System
  Ubuntu
  22.04.3
  LTS
Runner Image
  Image: ubuntu-22.04
  Version: 20231211.1.0
  Included Software: https://github.com/actions/runner-images/blob/ubuntu22/20231211.1/images/ubuntu/Ubuntu2204-Readme.md
  Image Release: https://github.com/actions/runner-images/releases/tag/ubuntu22%2F20231211.1
Runner Image Provisioner
  2.0.321.1
GITHUB_TOKEN Permissions

OS of the machine running the runner? OSX/Windows/Linux/...

What's not working?

Please include error messages and screenshots.

Job Log Output

If applicable, include the relevant part of the job / step log output here. All sensitive information should already be masked out, but please double-check before pasting here.

Runner and Worker's Diagnostic Logs

If applicable, add relevant diagnostic log information. Logs are located in the runner's _diag folder. The runner logs are prefixed with Runner_ and the worker logs are prefixed with Worker_. Each job run correlates to a worker log. All sensitive information should already be masked out, but please double-check before pasting here.

Dec 14 '23 13:12 GuyAv46

Debug run output of the exit 1 step:

 ##[debug]Evaluating condition for step: 'Run exit 1'
##[debug]Evaluating: (success() && contains(needs[*].result, 'failure'))
##[debug]Evaluating And:
##[debug]..Evaluating success:
##[debug]..=> true
##[debug]..Evaluating contains:
##[debug]....Evaluating Index:
##[debug]......Evaluating Index:
##[debug]........Evaluating needs:
##[debug]........=> Object
##[debug]........Evaluating Wildcard:
##[debug]........=> '*'
##[debug]......=> Array
##[debug]......Evaluating String:
##[debug]......=> 'result'
##[debug]....=> Array
##[debug]....Evaluating String:
##[debug]....=> 'failure'
##[debug]..=> false
##[debug]=> false
##[debug]Expanded: (true && contains(Array, 'failure'))
##[debug]Result: false

Dec 14 '23 13:12 GuyAv46

Screenshot from 2023-12-14 15-41-54

Dec 14 '23 13:12 GuyAv46

Just happened again, and I think I figured out why:

The external/first matrix uses workflow calls with the default fail-fast. When one of the entries fails, it cancels the other matrix jobs and marks the matrix result as cancelled.
When I tested a similar scenario, but the matrix has its own steps (and not calling a reusable workflow), the matrix result was failure.

I couldn't find any mention of this inconsistent behavior.

Dec 17 '23 13:12 GuyAv46

Hey @GuyAv46, here is an example which cancels the final job if some of previous jobs had failed.

name: test_require_status_check

on:
  workflow_dispatch:
  pull_request:
  push:

jobs:
  pre-requisite:
    runs-on: ubuntu-latest
    steps:
      - name: Get Req
        id: get-req
        run: echo "get req"

  build-one:
    name: build one
    runs-on: ubuntu-latest
    needs: pre-requisite
    steps:
      - run: | 
          echo "build-one"
          exit 1

  build-two:
    name: build two
    runs-on: ubuntu-latest
    needs: pre-requisite
    steps:
      - run: echo "build-two"

  test:
    name: Test initial
    needs:
      - pre-requisite
      - build-two
    runs-on: ubuntu-latest
    steps:
      - run: |
          echo "Testing builds"
          exit 1

  test-more:
    name: Test more
    needs:
      - pre-requisite
      - build-two
    runs-on: ubuntu-latest
    steps:
      - run: echo "Test more"

  results:
    runs-on: ubuntu-latest
    if: ${{ always() && contains(needs.*.result, 'failure') == false}}
    needs:
      - build-one
      - build-two
      - test
      - test-more
    steps:
      - run: echo "Workflows have succeeded!"

And below you can see diagram image

Dec 26 '23 11:12 ruvceskistefan

The problem is that if I only skip the task, it's enough for the required tasks for the branch protection rules, and the PR is approved. I had to force this job to run and fail if there was a fail/cancel.

My solution was to add "cancelled" to the expected output for failure:

  pr-validation:
    needs: [test-linux-x86_64, test-linux-aarch64, test-macos, coverage, sanitize]
    runs-on: ubuntu-latest
    if: ${{ !cancelled() }}
    steps:
      - if: contains(needs.*.result, 'failure') || contains(needs.*.result, 'cancelled')
        run: exit 1

Dec 26 '23 13:12 GuyAv46

I'm seeing this behaviour as well when using any of the various of if cancelled()I've seen. Here's one of the test steps I have in my workflow:

  echo-test:

    runs-on: ubuntu-latest
    if: ${{ cancelled() }}
    run: echo "Job was cancelled!"

I've tried using just if: cancelled() along with if: ${{ always() }} and - if: ${{ needs.*.result == 'cancelled' }} . I was going to raise a separate issue but it seems like we're seeing the same behaviour, so probably best not to duplicate work for the team.

Jan 04 '24 15:01 bpp-liamhardman

After I thought I found the workaround I needed, it happened again. This time my pr-validation checks for both failure and cancelled, but somehow the result was neither.

  pr-validation:
    needs:
      - docs-only # if the setup jobs fail, the rest of the jobs will be skipped, and we will exit with a failure
      - get-latest-redis-tag
      - get-latest-7-0-redis-tag
      - get-latest-6-2-redis-tag
      - get-latest-6-0-redis-tag
      - test-linux-x86_64
      - test-linux-aarch64
      - test-macos
      - coverage
      - sanitize
    runs-on: ubuntu-latest
    if: ${{ !cancelled() }}
    steps:
      - if: contains(needs.*.result, 'failure') || contains(needs.*.result, 'cancelled')
        run: exit 1

Jan 28 '24 15:01 GuyAv46

Debug logging is pretty useless if you can't see the content of the Array. I tried to reproduce it but was only able to achieve either 'failure' or 'cancelled'

Jan 28 '24 15:01 GuyAv46

I figured out what is the problem. As I mentioned before, if a matrix is used with a reusable workflow, and one of the matrix runs fails, the result of the matrix is CANCELLED. I tried to work around this, but... If you have a workflow containing a matrix that calls a reusable workflow, and in this workflow, there is another matrix that uses another reusable workflow, if one of the deepest workflows fails, its parent matrix will report cancelled, which will cause the grandparent matrix TO IGNORE THE RETURN STATUS AND REPORT SUCCESS. This means that I must flatten out my matrix so a cancelled status won't be ignored. I can't have a fail-fast in this case, but at least my CI won't pass on a failing flow. I hope this will get some attention soon

Jan 30 '24 12:01 GuyAv46

Using ARR variable to get the results of the list:

  pr-validation:
    needs:
      - docs-only # if the setup jobs fail, the rest of the jobs will be skipped, and we will exit with a failure
      - get-latest-redis-tag
      - get-latest-7-0-redis-tag
      - get-latest-6-2-redis-tag
      - get-latest-6-0-redis-tag
      - test-linux-x86_64
      - test-macos
      - coverage
      - sanitize
    runs-on: ubuntu-latest
    if: ${{ !cancelled() }}
    env:
      ARR: ${{ toJson(needs.*.result) }}
    steps:
      - if: contains(needs.*.result, 'failure') || contains(needs.*.result, 'cancelled')
        run: exit 1
      - run: exit 1

Jan 30 '24 12:01 GuyAv46

I came across this issue as well. From what I can tell, it's specific to having a matrix that calls a reusable workflow. When one job of the matrix fails, and others fail fast, it reports a cancelled status but the cancelled() status check function doesn't return true.

I have a simple reproduction in this repo with runs showing the behavior, and also started this discussion post yesterday.

In the simple case of a single matrix (I don't have any matrix calling another matrix), using if: ${{ !contains(needs.*.result, 'cancelled') }} instead of if: ${{ !cancelled() }} is good enough.

Mar 05 '24 15:03 tgoric

This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 15 days.

Mar 10 '25 00:03 github-actions[bot]

This issue was closed because it has been stalled for 15 days with no activity.

Apr 07 '25 00:04 github-actions[bot]