opentelemetry-operator Embed a PodTemplate in the OpentelemetryCollector CRD

trafficstars

The OpentelemetryCollector CRD currently defines a few configuration options for env-var injection from ConfigMap or Secret resources and a couple of other options. But it does not expose most of the configuration of a Pod: You cannot control

the Pod's annotations (https://github.com/open-telemetry/opentelemetry-operator/issues/900)
its resource limits and ranges (https://github.com/open-telemetry/opentelemetry-operator/issues/895)
details of the storage configuration (https://github.com/open-telemetry/opentelemetry-operator/issues/897)
imagePullSecrets (https://github.com/open-telemetry/opentelemetry-operator/issues/846)
liveness probe settings (https://github.com/open-telemetry/opentelemetry-operator/issues/760)
...

These issues share a common underlying problem: The operator does not use a PodTemplate in the OpentelemetryCollector resource. It instead constructs the Deployment and embedded PodTemplate spec.template entirely using individual configuration settings.

That won't scale or be maintainable. What about security policies? What about ... anything you can name that can appear in a pod?

The future-resistant solution is to add an optional PodTemplate to the OpentelemetryCollector CRD, and deprecate the current keys image, serviceAccount, env etc.

May 31 '22 09:05 ringerc

For example, this

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel
spec:
  image: my-registry/opentelemetry-collector-contrib:vA.B.C
  upgradeStrategy: none
  mode: deployment
  serviceAccount: otel-collector
  env:
    - name: MYAPP_ENVIRONMENT
      valueFrom:
        configMapKeyRef:
          name: some-configmap
          key:  ENVIRONMENT
  config: |
    ... yaml blob here ...

would be spelled

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel
spec:
  mode: deployment
  upgradeStrategy: none
  template:
    spec:
      serviceAccountName: otel-collector
      containers:
        - name: otc-container
          image: my-registry/opentelemetry-collector-contrib:vA.B.C
          env:
            - name: MYAPP_ENVIRONMENT
              valueFrom:
                configMapKeyRef:
                  name: some-configmap
                  key:  ENVIRONMENT
  config: |
    ... yaml blob here ...

The opentelemetry-operator would validate that, if present, spec.template.spec.containers has at least one entry named exactly otc-container (unless a top-level containerName setting overrides that name).

Then it would inject the appropriate args, create or append its own environment to the env array, set image with a default if not specified already, add its livenessProbe and volumeMounts, etc.

This has many advantages:

Patches from tools like Kustomize just work
Tools can discover and rewrite the container image references at the standard locations automatically
Support for new pod configuration is inherited without new CRD options being added

Existing CRD options could be retained for BC, to prevent the need for an incompatible API version bump. Or it could move to apiVersion: opentelemetry.io/v1alpha2.

May 31 '22 09:05 ringerc

Here's an expanded example of what the PodTemplate based OpentelemetryCollector could configure:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel
spec:
  mode: deployment
  upgradeStrategy: none
  template:
    spec:
      serviceAccountName: otel-collector
      imagePullSecrets:
      - name: myregistrykey
      containers:
        - name: collector
          image: my-registry/opentelemetry-collector-contrib:vA.B.C
          imagePullPolicy: Always
          env:
            - name: MYAPP_ENVIRONMENT
              valueFrom:
                configMapKeyRef:
                  name: some-configmap
                  key:  ENVIRONMENT
          envFrom:
            - configMapRef:
                name: inject-all-env-from-this-configmap
            - secretRef:
                name: inject-all-env-from-this-secret
          readinessProbe:
            failureThreshold: 30
            initialDelaySeconds: 5
            periodSeconds: 2
            successThreshold: 1
            timeoutSeconds: 3
          resources:
            limits:
              cpu: "1"
              memory: 1Gi
            requests:
              cpu: 100m
              memory: 256Mi
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
              - ALL
            privileged: false
            readOnlyRootFilesystem: true
            runAsGroup: 1337
            runAsNonRoot: true
            runAsUser: 1337

      initContainers:
        - name: someInitContainer
          image: foo
          args: ["do","stuff","specific","to","my","k8s"]
  config: |
    ... yaml blob here ...

May 31 '22 09:05 ringerc

I'm pretty new to the stack but I could possibly attempt this, if I had some indication the idea would be welcomed. It doesn't look excessively complex.

But I'd really want project owner feedback before I launched into it.

May 31 '22 09:05 ringerc

I agree with the design if the objective is to be as flexible as possible. However, the goal of CRD is to abstract deployment and operational details. If the pod spec is exposed in the CRD it gives a lot of flexibility but is more error-prone.

Jun 07 '22 08:06 pavolloffay

@pavolloffay Yeah, that's definitely a concern. And there are some places where the operator will have to override what's in the pod spec.

But if you look at the list of linked issues, there are lots of different things people are asking for that boil down to "expose elements of the Pod and Container specs for configuration via the operator". That currently involves custom CRD additions and code for each such configuration element.

An alternative would be to cherry-pick subsections like the resources and securityContext keys, add podAnnotations and podLabels maps, etc. But that's not going to help with volumes and volumemounts people may need to inject additional certificates, ... the list goes on.

The more I think about it the more I think this should really be something the operator-sdk helps operators out with. But it'd be worth doing here first.

Jun 09 '22 04:06 ringerc

I'm going to withdraw this, as I've retired my use of the opentelemetry-operator in favour of directly managed manifests.

Would you like the issue left open in case someone else wants to adopt it, or just close it?

Jul 29 '22 06:07 ringerc

We can keep it open and see what other people think.

What was your main pain point to move to the plain manifests?

Jul 29 '22 09:07 pavolloffay

I'm currently evaluating this operator and one pain point I'm having is that there is no ReadinessProbe. I've seen how to provide that (it's a one line code change) but having the PodTemplate as mentioned in this issue would allow me to do that directly in configuration today with an official release. Right now I either contribute it, get it accepted and wait for next release to include it, or I have to fork this project to complete our testing.

Aug 03 '22 10:08 miguelbernadi

@miguelbernadi could you please share your readiness probe definition for the OTELcol?

Aug 03 '22 10:08 pavolloffay

We have several tenants with diverse background and needs in our systems and we want to evaluate OTEL as a scheme to simplify our internal observability pipelines and custom data processors. So we need to reproduce and support the current setups with just replacing the internal tooling before we can extend into the other features available in OTEL.

The reason we want to use ReadinessProbes is because we prefer to use the Deployment approach, and when we change configurations or scale out/in replicas we may lose data if the pods are not ready but the Rollout continues. We are choosing the Deployment approach for now as we need to aggregate some infrastructure metrics currently present in an existing Prometheus server and application metrics sent with statsd. The result of these should be sent to the tenent's DataDog and GrafanaCloud accounts, for some tenents to one of them to one of them and for others to both.

I'm using the health_check extension to get a health endpoint, and the operator automatically adds a LivenessProbe to my pods:

         livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: 13133
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1

It is enough for me to add an identical ReadinessProbe:

         readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: 13133
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1

We may need to add other requirements to these manifests once we go into production, so a PodTemplate would allow us to do set these values as configuration in the CRD itself instead of requiring code changes. We will eventually need resources and securityContext as well, though our current testing effort does not get in there.

Aug 03 '22 14:08 miguelbernadi

Another related issue: https://github.com/open-telemetry/opentelemetry-operator/issues/1684

Apr 27 '23 19:04 TylerHelmuth

^ Yes, I filed the above. I agree that overriding the entire podspec is maybe a bit too much given how useful it is for the operator to manage a lot of the defaults (which would probably mean that the podspec overrides would have to be intelligently merged, which is hard to do well/probably comes with various footguns.)

It does seem that, in general, k8s operators end up exposing knobs on most of the pod spec fields eventually, because questions about overriding X or Y always come up at scale :)

Re: the initial comment:

The future-resistant solution is to add an optional PodTemplate to the OpentelemetryCollector CRD, and deprecate the current keys image, serviceAccount, env etc.

This approach, mutating a podspec given by the user, as opposed to the user mutating the default podspec provided by the operator, could be interesting. I can see it being a bit difficult to explain that, e.g., you have to make sure there's some container named otel-collector but it's not so bad.

May 01 '23 22:05 diurnalist

I'm convinced this is actually a fundamental defect in the concept of how operators are designed and used in k8s.

It's completely impractical to have each operator expose its own custom, CR-specific configuration for each of these. But right now there's no sensible k8s base type to embed in a CR to allow their configuration in a generic way. And it gets even worse if a CR defines multiple different resources, where you might have to apply some specific annotation/label/setting to only one of the operator-deployed sub-resources but not others...

So if you use operators that define workloads you're pretty much forced to introduce mutating webhooks to munge what the operator deploys into what you actually need. Environments will need to control security settings like capabilities flags, apparmor or seccomp filters; CSP-specific annotations for managed identity/pod authorization; service mesh annotations; resource requests/limits; etc. But then you can face issues with the operator entering an infinite reconciler loop if it notices the changes made by the webhook and decides it needs to update the object, as I've seen with env-var injecting webhooks amongst others. It's a nightmare.

This isn't specific to the otel operator. It's really something the k8s SIGs need to deal with. But I still think embedding a PodTemplate is the least-bad way to work around it until or unless the base k8s operator design and sample implementation is improved with a proper solution to the problem. I'm no longer offering to work on it for otel operator though, as I had to retire my use of it due to the aforementioned issues. My org has had to replace many common operators recently, or fork them with org-specific patches, due to issues with controlling resources or security related configs.

Aug 03 '23 00:08 ringerc

The Prometheus operator supports this by allowing a containers[] list to be added to the Prometheus resource. If the container names match those pre-defined by the collector they're strategic-merged. Other containers are added. So it can support auth sidecars and the like this way.

Nov 03 '23 04:11 ringerc

Hello! I just merged in a PR for our v2 spec that will have common fields for any of our CRDs that deploy pods. We plan on ensuring that this spec is as close to the full pod template going forward. Please refer to our v2 milestone if there are fields you find missing.

Jan 10 '24 17:01 jaronoff97

opentelemetry-operator opentelemetry-operator copied to clipboard

Embed a PodTemplate in the OpentelemetryCollector CRD

opentelemetry-operator
opentelemetry-operator copied to clipboard