pipecd
pipecd copied to clipboard
Daployment randomly fails with "signal: killed"
What happened:
kubernetes deployments randomly fail with the following message:
Failed to apply manifest: name="***", kind="<***>", namespace="***", apiVersion="apps/v1" (failed to apply: (signal: killed))
it happens randomly, on random manifests (can be Deployment, ConfigMap, Secret, etc...). sometimes it also fails the rollback with the same message. re-triggering a deploy with sync sometimes fixes it
What you expected to happen: consistent deploys
How to reproduce it: happens randomly
Environment:
-
piped
version: 0.9.15, but it happened since 0.9.9 -
control-plane
version: same - Others:
@m-ronchi Hi.
Thanks for your report.
Based on the error message "(signal: killed)", I think that your piped
pod was terminated during Kubernetes' rescheduling process. (We will improve the error message to make it more understandable.)
Can I have some questions?
- Did
piped
pod restart after got that error? - The running deployments were resumed after that or you had to re-trigger it manually?
- no, the pod continued to run (unrelated, I did manually restart it to try to fix #1934 that happened again). I think. that the
kubectl
process was killed and piped didn't handle it properly - the deployments had failed (and the rollback failed too). when I synced the app manually from the frontend they restarted
I did found an unconstrained pod on the node that was running piped though. still, this kind of non-deterministic failures should be retried, especially when rolling back as it can leave an inconsistent cluster state (on Unix, a killed process has exit code = 128 + [signal number]. you can use that to handle os vs kubectl errors)
Hi @m-ronchi, various things have changed since then but hope you go well with PipeCD. Could you confirm the existence of this issue? Is it resolved already?
Staled!