SCRIPT_RUN_ROLLBACK failed when executing multiple SCRIPT_RUN stages.
What happened:
If you perform a rollback with multiple Script Runs specified, the execution of the SCRIPT_RUN_ROLLBACK stage will fail.
What you expected to happen:
Successfully finish executing the SCRIPT_RUN_ROLLBACK stage.
How to reproduce it:
Execute the deployment with multiple SCRIPT_RUN stage, and cancel after some of them are in the executing.
apiVersion: pipecd.dev/v1beta1
kind: KubernetesApp
spec:
name: script-run-like-jenkins
labels:
env: example
team: product
pipeline:
stages:
- name: SCRIPT_RUN
with:
run: |
sh script.sh
onRollback: |
echo rollback
- name: SCRIPT_RUN
with:
run: |
sleep 10
sh script.sh
onRollback: |
echo $SR_DEPLOYMENT_ID
echo $SR_APPLICATION_ID
echo $SR_APPLICATION_NAME
echo $SR_TRIGGERED_AT
echo $SR_TRIGGERED_COMMIT_HASH
echo $SR_REPOSITORY_URL
echo $SR_SUMMARY
echo $SR_CONTEXT_RAW
sh script.sh
- name: SCRIPT_RUN
with:
run: |
sleep 10
sh script.sh
Environment:
pipedversion:control-planeversion:- Others:
[root cause]
The error occurs when piped tries to store the stage log to the completed SCRIPT_RUN_ROLLBACK stage.
piped identifies the target stage with stage ID to store the stage log.
The ID of the PredefinedStage is the const value.
- https://github.com/pipe-cd/pipecd/blob/c502a27693e5c4cad1748e6f088fbd316e13f835/pkg/app/piped/planner/predefined_stages.go#L33-L74
So if there are multiple predefined stages, piped refers the completed one.
I tried to add suffix to the stageID for SCRIPTRUN_ROLLBACK stage like this. https://github.com/pipe-cd/pipecd/commit/7a475a687e9eed85cd105a542db4b663f8415f66
But it failed when rollback.
The error comes from finding the stage config with stageID on the executing stage. https://github.com/pipe-cd/pipecd/blob/7a475a687e9eed85cd105a542db4b663f8415f66/pkg/app/piped/controller/scheduler.go#L532-L547
Currently, the SCRIPT_RUN_ROLLBACK stage is a predefined stage, and it is assumed that there are multiple in the pipeline. But we should modify the spec to execute only one SCRIPT_RUN_ROLLBACK because of the reason below.
- When storing the stage log, the target stage is identified by the stage ID.
- If there are multiple stages with the same ID, if one completes, writing to the other stages will fail. This is because the stage log cannot be updated once it is completed.
- The config of the predefined stage is identified by unique value. So we can't modify stage ID.
[root cause] The error occurs when piped tries to store the stage log to the completed SCRIPT_RUN_ROLLBACK stage.
piped identifies the target stage with stage ID to store the stage log.
The ID of the PredefinedStage is the const value.
This means that this problem occurs because of the duplication of the stage id.
We plan to fix the duplications for the stage id in plugin-arch piped.(pipedv1)