fluent-operator icon indicating copy to clipboard operation
fluent-operator copied to clipboard

Helm Chart: uninstall not completing cleanly

Open hostalp opened this issue 2 years ago • 5 comments

Describe the bug

After installing the fluent-operator (1.0.0) via Helm and then uninstalling it, some resources aren't removed cleanly and become stuck in the terminating state. I can understand that CRDs aren't deleted by Helm (see #132), but I still have to mess with the other resources in order to get things fully removed removed. This shouldn't be necessary.

To Reproduce

helm upgrade --install fluent-operator --create-namespace -n logging https://github.com/fluent/fluent-operator/releases/latest/download/fluent-operator.tgz -f fluent-operator-values_sandbox.yaml With fluent-operator-values_sandbox.yaml containing:

#Set this to containerd or crio if you want to collect CRI format logs
containerRuntime: docker
Kubernetes: true

operator:
  resources:
    limits:
      cpu: 100m
      memory: 50Mi
    requests:
      cpu: 10m
      memory: 20Mi

fluentbit:
  # fluentbit resources. If you do want to specify resources, adjust them as necessary
  #You can adjust it based on the log volume.
  resources:
    limits:
      cpu: 500m
      memory: 200Mi
    requests:
      cpu: 10m
      memory: 25Mi
  #Set a limit of memory that Tail plugin can use when appending data to the Engine.
  #If the limit is reach, it will be paused; when the data is flushed it resumes.
  #if the inbound traffic is less than 2.4Mbps, setting memBufLimit to 5MB is enough
  #if the inbound traffic is less than 4.0Mbps, setting memBufLimit to 10MB is enough
  #if the inbound traffic is less than 13.64Mbps, setting memBufLimit to 50MB is enough
  input:
    tail:
      memBufLimit: 5MB

fluentd:
  enable: true
  replicas: 1
  forward:
    host: "logging.svc"
  watchedNamespaces:
    - kube-system
  resources:
    limits:
      cpu: 1000m
      memory: 256Mi
    requests:
      cpu: 20m
      memory: 64Mi
  #Configure the output plugin parameter in Fluentd.
  #You can set enable to true to output logs to the specified location.
  output:
    es:
      enable: true
      host: logs1.local.loc
      port: 9200
      logstashPrefix: k8s_logs_sb
      buffer:
        enable: true
        type: file
        path: /buffers/es

(but it occurs even with much simpler config with only fluent-bit enabled - with es output)

The following uninstall (without adding any our own fluent-operator related resources in-between the install & uninstall) isn't performed cleanly so I have to perform something like the following:

helm uninstall fluent-operator -n logging
kubectl patch -n logging FluentBit fluent-bit -p '{"metadata":{"finalizers":[]}}' --type=merge

kubectl delete crd "$(kubectl get crd -o jsonpath='{.items[?(@.spec.group==\"fluentd.fluent.io\")].metadata.name}')"
kubectl delete crd "$(kubectl get crd -o jsonpath='{.items[?(@.spec.group==\"fluentbit.fluent.io\")].metadata.name}')"

kubectl delete namespace -n logging

Ignoring CRDs (as mentioned above), I still have to clear finalizers of the FluentBit type resource fluent-bit, otherwise it would remain stuck in the terminating state.

Expected behavior

The FluentBit resource fluent-bit would be cleanly removed during Helm uninstall.

Your Environment

- Fluent Operator version: Helm Chart 1.0.0
- Container Runtime: docker, Kubernetes 1.20.15

How did you install fluent operator?

helm upgrade --install fluent-operator --create-namespace -n logging https://github.com/fluent/fluent-operator/releases/latest/download/fluent-operator.tgz -f fluent-operator-values_sandbox.yaml

What happened?

No response

Your Error Log

N/A

Additional context

No response

hostalp avatar May 10 '22 20:05 hostalp

This is the reason for helm. The reason is that helm unloads each cr in an unordered way, so it causes a certain CR to get stuck. fluent-operator does the relevant processing when deleting fluentd or fluentbit, you can refer to these codes. https://github.com/fluent/fluent-operator/blob/master/controllers/fluent_controller_finalizer.go

wenchajun avatar May 11 '22 10:05 wenchajun

So the problem is in the CR uninstall order? You may try the workaround with List: https://github.com/helm/helm/issues/8439#issuecomment-1068979423 https://github.com/kubernetes/kubernetes/blob/master/hack/testdata/list.yaml However I'm not sure if uninstall would respect that (in reverse order), perhaps not.

Other than waiting for the https://github.com/helm/helm/issues/8439 to be implemented, perhaps a pre-delete Helm chart hook may help to at least make sure everything gets deleted.

hostalp avatar May 12 '22 19:05 hostalp

Yes, it might be possible to add a preprocessing mechanism to determine the order of uninstallation of helm, which might be a new feature.

wenchajun avatar May 13 '22 02:05 wenchajun

I have the same question,when i echo " helm install fluent-operator fluent-operator -f fluent-operator/values.yaml ",i can get fluent-bit & fluentd & fluent-operator pods in running.But when i echo "helm uninstall fluent-operator ",only fluent-operator pods are removed,fluent-bit & flunentd pods still in running.Then i echo " helm install fluent-operator fluent-operator -f fluent-operator/values.yaml " again,fluent-bit & flunentd pods will be deleted. I think it's so weird that all the pods about fluent should be removed on the first helm uninstall.

666MrFang avatar Sep 08 '22 06:09 666MrFang

I have the same issue and the reason is fluent-bit fluentbit CRD resource. For forcing deletion you need to remove its finalizer section

$ kubectl edit fluentbit fluent-bit -n fluent

ajax-bychenok-y avatar Sep 28 '23 08:09 ajax-bychenok-y