chaosblade icon indicating copy to clipboard operation
chaosblade copied to clipboard

delete pod-network-delay rule will be failure when the pod restart

Open bmbbms opened this issue 3 years ago • 5 comments

Issue Description

bug report

Describe what happened (or what feature you want)

when i set a network delay rule for a pod, it make pod livness probe failed,and the pod will be restarted. at this time, if i want to delete the network delay rules ,it will be failure ,because the containerId will be changed when the pod restart. actually the network delay rule continue using the origin containerId to delete the pod network delay.

Describe what you expected to happen

so the containerId is not good for the specified rules. we should theck the Identifier's containerId whether changed when delete failure

How to reproduce it (as minimally and precisely as possible)

  1. first deply a network delay for a pod
        Status:
          Exp Statuses:
            Action:  delay
            Res Statuses:
              Id:          b42b0ee218262ce9
              Identifier:  test-testing-dc-k2030/172.20.35.51/reliable-msg-route-5fdc8cc757-hwvdt/reliable-msg-route/18f0b9d032ce
              Kind:        pod
              State:       Success
              Success:     true
            Scope:         pod
            State:         Success
            Success:       true
            Target:        network
          Phase:           Running
        Events:            <none>
  1. make sure the delay can result in the pod live probe failed and restart
test-testing-dc-k2030         reliable-msg-route-5fdc8cc757-hwvdt               1/1     Running            4          3d      192.168.137.81    172.20.35.51   <none>           <none>
  1. delete the rule

Status:
  Exp Statuses:
    Action:  delay
    Error:   see resStatus for the error details
    Res Statuses:
      Error:       Error response from daemon: No such container: 18f0b9d032ce
      Id:          b42b0ee218262ce9
      Identifier:  test-testing-dc-k2030/172.20.35.51/reliable-msg-route-5fdc8cc757-hwvdt/reliable-msg-route/18f0b9d032ce
      Kind:        pod
      State:       Error
      Success:     false
    Scope:         pod
    State:         Success
    Success:       false
    Target:        network
  Phase:           Destroying

  1. if i delete the rule force,actually the delay rules still in the pod

Tell us your environment

k8s v1.16.15 chaosblade-operator-v0.9.0

Anything else we need to know?

bmbbms avatar Mar 31 '21 09:03 bmbbms

You can set --daemonset-enable=false flag to close sidecar model when deploying chaosblade-operator to solve the problem.

xcaspar avatar Mar 31 '21 09:03 xcaspar

i see the default value of this parm is false.

bmbbms avatar Mar 31 '21 09:03 bmbbms

You can delete the pod to recover it. I will solve this problem later.

xcaspar avatar Mar 31 '21 09:03 xcaspar

actually it will work well when i apply the rule again using --force ,and i will success delete the rule before the pod next restarting . but i think it not a perfect way for doing that,so i report the bug.

bmbbms avatar Mar 31 '21 09:03 bmbbms

@xcaspar I am using chaosblade-operator-v1.3.0 and k8s v1.21.4, still faced with this issue. Would there be any fix on next release or is there any work around to bypass this issue. Thanks.

yzhang559 avatar Nov 10 '21 22:11 yzhang559