chaosblade
chaosblade copied to clipboard
delete pod-network-delay rule will be failure when the pod restart
Issue Description
bug report
Describe what happened (or what feature you want)
when i set a network delay rule for a pod, it make pod livness probe failed,and the pod will be restarted. at this time, if i want to delete the network delay rules ,it will be failure ,because the containerId will be changed when the pod restart. actually the network delay rule continue using the origin containerId to delete the pod network delay.
Describe what you expected to happen
so the containerId is not good for the specified rules. we should theck the Identifier's containerId whether changed when delete failure
How to reproduce it (as minimally and precisely as possible)
- first deply a network delay for a pod
Status:
Exp Statuses:
Action: delay
Res Statuses:
Id: b42b0ee218262ce9
Identifier: test-testing-dc-k2030/172.20.35.51/reliable-msg-route-5fdc8cc757-hwvdt/reliable-msg-route/18f0b9d032ce
Kind: pod
State: Success
Success: true
Scope: pod
State: Success
Success: true
Target: network
Phase: Running
Events: <none>
- make sure the delay can result in the pod live probe failed and restart
test-testing-dc-k2030 reliable-msg-route-5fdc8cc757-hwvdt 1/1 Running 4 3d 192.168.137.81 172.20.35.51 <none> <none>
- delete the rule
Status:
Exp Statuses:
Action: delay
Error: see resStatus for the error details
Res Statuses:
Error: Error response from daemon: No such container: 18f0b9d032ce
Id: b42b0ee218262ce9
Identifier: test-testing-dc-k2030/172.20.35.51/reliable-msg-route-5fdc8cc757-hwvdt/reliable-msg-route/18f0b9d032ce
Kind: pod
State: Error
Success: false
Scope: pod
State: Success
Success: false
Target: network
Phase: Destroying
- if i delete the rule force,actually the delay rules still in the pod
Tell us your environment
k8s v1.16.15 chaosblade-operator-v0.9.0
Anything else we need to know?
You can set --daemonset-enable=false
flag to close sidecar model when deploying chaosblade-operator to solve the problem.
i see the default value of this parm is false.
You can delete the pod to recover it. I will solve this problem later.
actually it will work well when i apply the rule again using --force ,and i will success delete the rule before the pod next restarting . but i think it not a perfect way for doing that,so i report the bug.
@xcaspar I am using chaosblade-operator-v1.3.0 and k8s v1.21.4, still faced with this issue. Would there be any fix on next release or is there any work around to bypass this issue. Thanks.