woodpecker
woodpecker copied to clipboard
Fix pipeline cancelling
Component
server, agent
Describe the bug
This is mainly a summary issue of https://github.com/woodpecker-ci/woodpecker/issues/833, https://github.com/woodpecker-ci/woodpecker/issues/2062 and https://github.com/woodpecker-ci/woodpecker/issues/2911
I've been trying to debug this without real success.
I've been using the local backend, and can do the following observations:
- cancel pipeline while running: completely broken. The commands are finished, the step is marked as success, the pipeline too (https://github.com/woodpecker-ci/woodpecker/issues/2911)
- cancelling a pending pipeline seems to work for me
On ci.woodpecker-ci.org, I can see (uses docker backend):
- cancel pending pipeline, agent is available: the pipeline starts anyways (this probably is #2062)
- cancel running pipeline: works in general, but new status is failing but should be killed
System Info
next
Additional context
No response
Validations
- [X] Read the Contributing Guidelines.
- [X] Read the docs.
- [X] Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
- [X] Checked that the bug isn't fixed in the
next
version already [https://woodpecker-ci.org/faq#which-version-of-woodpecker-should-i-use] - [X] Check that this is a concrete bug. For Q&A join our Discord Chat Server or the Matrix room.
Woodpecker 2.1.1, Kubernetes.
-
cancel pending pipeline: removed from queue, released resources,
killed
pipeline status,skipped
step status Screenshot 2024-01-08 1 -
cancel running pipeline: removed from queue, released resources,
success
pipeline and step statuses Screenshot 2024-01-08 2
https://github.com/woodpecker-ci/woodpecker/issues/2253#issuecomment-2076542998
I've got a related issue, which is somewhat worrisome.
I was able to reproduce the original buck on a 2.3.0 installation with Kubernetes backend. I've observed it's inconsistent: sometimes cancelling will correctly show the running step as killed/cancelled and mark the pipeline as canceled. The last step to run will show "Oh no, we got some errors! Canceled" (remaining steps in the same workflow will show as grey, with the message "This step has been canceled."). Sometimes, it will show the last step to run as successful instead (and remaining steps in the same workflow will also show as grey, with the message "This step has been canceled.").
However if you have a second workflow that depends on the first (i.e. a multi-workflow pipeline, for example ./.woodpecker/a.yml and ./woodpecker/b.yml and "b" depends_on "a"), if workflow "a" is cancelled and we get the bug where its considered successful, than "b" will start running, and we will not have any way to cancel "b", because the cancel button will have been replaced by a Restart button ❗ This could lead to situations where an erroneous deployment is triggered and a developer is unable to stop it, for example.
⚠️ One thing I noticed is that, consistently, if I cancelled the pipeline between steps, that is, while a pod was in the Pending state (in other words, after a step was finished, but before the logs of a new step started to stream), the bug would occur and the pipeline would be marked as successful. However, if I were to cancel it while a step is in mid-execution (so I'm certain that a Pod was in the Running state) then the step would always cancel properly, marking the step and the whole Workflow as failed. Of course, this only applies and has only been tested on the Kubernetes backend.
I'd share links/screenshots but this all happened within our internal servers.
Feels related:
I am running the agent in docker compose. Woodpecker 2.7.1
Had this scenario: No agents available, about 30 pipelines created by cron. Then I canceled all the 29 previous pipelines. When restarting the agent it started to run through all the canceled pipelines and execute them... Kinda scary...
Ignore my last comment, there where just way more pipelines. the canceled ones stayed canceled, sorry for the confusuin.