tobs icon indicating copy to clipboard operation
tobs copied to clipboard

Opentelemetry Collector is not removed on stack uninstall

Open paulfantom opened this issue 3 years ago • 8 comments
trafficstars

What happened?

Run tobs uninstall and otel collector deployment was still running.

Did you expect to see something different?

All components should be removed

How to reproduce it (as minimally and precisely as possible):

tobs install -y && tobs uninstall

Environment

  • tobs installation method:
  • [x] cli
  • [x] helm chart
  • tobs version:

    0.10.0

  • Kubernetes version information:

    1.23

  • Kubernetes cluster kind:

kind

paulfantom avatar May 12 '22 16:05 paulfantom

Seems like operator either doesn't have enough time to remove the collector Deployment or it doesn't have this feature coded.

This issue may slip to next milestone.

paulfantom avatar May 30 '22 12:05 paulfantom

This is happening because we are annotating the OpentelemetryCollector CR as a helm hook. And by default, helm hooks are not managed with the releases, i.e. they don't get removed when uninstall happens [1].

The only solution I can see now is instructing users to manually delete the CR (+ other resources annotated as hook, if required) similar to how we instruct them for secrets etc. WDYT about this @paulfantom ?

[1] https://helm.sh/docs/topics/charts_hooks/#hook-resources-are-not-managed-with-corresponding-releases

onprem avatar Jun 30 '22 12:06 onprem

I agree, we'll need to document the uninstallation process in detail and point to helm shortcomings as of why this is needed. I also think it would be good to instruct users to use a dedicated namespace when using tobs as removal would be trivial at that point (simple kubectl delete ns <name> should suffice).

Either way let's put this in a backlog for now.

paulfantom avatar Jun 30 '22 14:06 paulfantom

I wonder if we can potentially tie the CR to some other object with kubernetes metadata.ownerReferences. :thinking: This way we would move removal responsibility from helm to kubernetes.

paulfantom avatar Jun 30 '22 14:06 paulfantom

Just adding to this, it makes it pretty impossible to run multiple tests using ct CLI tool. Not only is opentelemetry-collector left behind, there are many ConfigMap and Secret objects left behind. When ct moves on to n+1 test, it will fail due to these objects being left behind.

nhudson avatar Aug 09 '22 20:08 nhudson

timescale/tobs#535 is related as it instructs the user to manually delete the opentelemetrycollector CR manually.

nhudson avatar Aug 10 '22 22:08 nhudson

After some research and testing I don't think we can use the metadata.ownerReferences to make this work. This is due to the uid field that is required, and the uid has to be the uid of the resource that owns what we are setting.

So for example if we set the onwerReferences to the Deployment for the otel-operator, we would need a way to look up the uid of that Deployment when we apply the OpenTelemetryCollector and set it. From what I understand of the lookup() function in Helm this will never be possible, during a template, install, upgrade, delete etc. Maybe I am mis-reading this.

docs

Keep in mind that Helm is not supposed to contact the Kubernetes API Server during a helm template or a helm install|upgrade|delete|rollback --dry-run, so the lookup function will return an empty list (i.e. dict) in such a case.

nhudson avatar Aug 10 '22 23:08 nhudson

Related otel issue: https://github.com/open-telemetry/opentelemetry-helm-charts/issues/69

nhudson avatar Aug 11 '22 14:08 nhudson

The main difference between otel-operator and prometheus-operator is that when the prometheus-operator is removed, either via Helm chart or through the release manifest, the CRD is removed. With otel-operator the CRD is in left intact after the operator is removed. This will leave any collectors that are configured.

nhudson avatar Aug 17 '22 22:08 nhudson

@paulfantom what if we switched from using the operator to using the collector Helm chart instead? I assume that at some point we would want to add support for OpenTelemetryInstrumentation ?

nhudson avatar Aug 17 '22 22:08 nhudson

We cannot switch from operator helm chart as we want to have an ability to use auto-instrumentation feature provided only by the operator.

paulfantom avatar Aug 29 '22 13:08 paulfantom