opentelemetry-operator icon indicating copy to clipboard operation
opentelemetry-operator copied to clipboard

Instrumentation support selector

Open crossoverJie opened this issue 1 year ago • 19 comments

Component(s)

auto-instrumentation

Is your feature request related to a problem? Please describe.

Instrumentation applies to the specified Pod, similar to this: https://skywalking.apache.org/docs/skywalking-swck/latest/java-agent-injector/#1-label-selector-and-container-matcher

Describe the solution you'd like

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: my-instrumentation
spec:
  selector:
    app: appname

Describe alternatives you've considered

No response

Additional context

No response

crossoverJie avatar Mar 11 '24 10:03 crossoverJie

This approach has one considerable downside over the annotation. The annotation changes pod spec, therefore it triggers re-deployment which is needed by the OTEL operator to inject the auto-instrumentation (the operator uses pod mutating webhook)

pavolloffay avatar Mar 11 '24 10:03 pavolloffay

therefore it triggers re-deployment which is needed by the OTEL operator to inject the auto-instrumentation (the operator uses pod mutating webhook)

Yeah, you're right.

However, we hope that all the configurations associated with OTel can be handed over to Operator, and if the annotation are used, we also need to maintain them ourselves during the deployment.

crossoverJie avatar Mar 11 '24 11:03 crossoverJie

This is our usage scenario; I'll submit a PR if appropriate.

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: my-instrumentation
spec:
  selector:
    app: appname
  propagators:
    - tracecontext
  java:
    image: image

crossoverJie avatar Mar 22 '24 10:03 crossoverJie

@crossoverJie we discussed this issue at the SIG (and realized we should have probably done this prior to your PR) and had a few follow up questions.

I saw the link you shared for how skywalking handles this, and at first glanced it seemed very similar, however, there's a key difference. Skywalking seems to be a java-only auto-instrumentation solution whereas opentelemetry can inject instrumentation for multiple languages. That results in what we agreed to be a confusing experience for users where a cluster admin needs to set a piece of configuration AND a tenant in the cluster needs to set configuration as well.

  1. What are the user stories for this request and how do they differ from today? With the change you've made, it seems to be a very similar end state where both cluster admin and tenant need to do actions which feels against the goal of the issue which is to put the onus of instrumentation solely in the hands of the cluster admin.

  2. What are the reasons for having multiple instrumentation resources for the same language in the same cluster? I'm not sure I understand the exact issue today that the current solution solves.

I was imagining it would be more beneficial for both personas if the cluster admin could specify a set of rules that specify a selector for applications to target for injection and specify which languages (and optionally container names) to inject. Let me know your thoughts here, and if you'd like to meet more to discuss the above I'd be happy to chat.

jaronoff97 avatar May 23 '24 21:05 jaronoff97

@jaronoff97

Thank you for following up on this issue.

That results in what we agreed to be a confusing experience for users where a cluster admin needs to set a piece of configuration AND a tenant in the cluster needs to set configuration as well.

Can you explain the relationship between cluster admin and tenants here? In my scenario, I'm the only one maintaining Instrumentation.

2. What are the reasons for having multiple instrumentation resources for the same language in the same cluster? I'm not sure I understand the exact issue today that the current solution solves.

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: instrumentation-sit-grpc-consumer
  namespace: sit
spec:
  env:
    - name: OTEL_SERVICE_NAME
      value: grpc-consumer
  selector:
    matchLabels:
      app: grpc-consumer
  java:
    image: autoinstrumentation-java:v1
    extensions:
      - image: extensions:v1
        dir: /extensions
    env:
      - name: OTEL_RESOURCE_ATTRIBUTES
        value: service.name=grpc-consumer

---
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: instrumentation-sit-grpc-provider
  namespace: sit
spec:
  env:
    - name: OTEL_SERVICE_NAME
      value: grpc-provider
  selector:
    matchLabels:
      app: grpc-provider
  java:
    image: autoinstrumentation-java:v1
    extensions:
      - image: extensions:v2
        dir: /extensions

    env:
      - name: OTEL_RESOURCE_ATTRIBUTES
        value: service.name=grpc-provider

This is our usage scenario, and there are some application-specific configurations that need to be defined in different Instrumentations.

There are some other examples:

  • New versions of javaagent are only tested on specific applications first.
  • Some specific applications only need to report metrics but do not want to report traces.
  • For resource isolation, the OTEL_EXPORTER_OTLP_ENDPOINT reported by some applications may be different.

Therefore we like that each application can maintain its own Instrumentation file independently.

crossoverJie avatar May 24 '24 02:05 crossoverJie

Thanks for adding more context! Given that you want each application to maintain its own Instrumentation, it's unclear to me why adding the name of the instrumentation in the application is more effort than today? All of the use cases you mentioned are solved with the existing system of annotations by allowing users to specify which instrumentation they want to use.

I still think the broader use case you describe is valuable, however, and it would be much simpler if the instrumentation selected its applications too. I think that instrumentation selector only works (or works best) when the user doesn't need to modify their application whatsoever to do so.

jaronoff97 avatar May 28 '24 19:05 jaronoff97

Thanks for your reply, the following issues may require further discussion.

it's unclear to me why adding the name of the instrumentation in the application is more effort than today?

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: order
  name: order
spec:
  template:
    metadata:
      annotations:
        instrumentation.opentelemetry.io/container-names: "order"
        instrumentation.opentelemetry.io/inject-java: "true"
        # associate instrumentation
        instrumentation.opentelemetry.io/instrumentation-name: "instrumentation-prod"
...
----
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: instrumentation-prod
  namespace: prod
...

Do you mean to associate Instrumentation in deployment?


All of the use cases you mentioned are solved with the existing system of annotations by allowing users to specify which instrumentation they want to use.

I don't understand clearly here.

Similar to my last comment, in the current time we will create an Instrumentation for each application and define environment variables in this Instrumentation.

Is there any other way to configure these environment variables? We don't want to configure startup parameters in the application, we want all configuration to be done by Operator.


I think that instrumentation selector only works (or works best) when the user doesn't need to modify their application whatsoever to do so.

Not sure I understand what you mean. Currently there is no need to modify the application, just associate the application through the selector in the Instrumentation.

crossoverJie avatar May 29 '24 07:05 crossoverJie

the example you linked demonstrates what i was referring to i.e. creating an instrumentation object and then referencing it directly via its name in the application.


Currently there is no need to modify the application, just associate the application through the selector in the Instrumentation.

There is a need to modify the application with at least the following in the world where we applied a selector as your PR does today. per your example:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: order
  name: order
spec:
  template:
    metadata:
      annotations:
        instrumentation.opentelemetry.io/container-names: "order"
        instrumentation.opentelemetry.io/inject-java: "true" # <<< This is always needed today
        # associate instrumentation
        instrumentation.opentelemetry.io/instrumentation-name: "true" # <<< This is still needed in your PR

What i'm thinking is a world where a cluster admin like yourself would define an instrumentation like so:

apiVersion: opentelemetry.io/v1alpha1
kind: InstrumentationRules
metadata:
  name: instrumentation-prod
  namespace: prod
spec:
  rules:
  - name: inject-java-app-prod
     java: <java-config>
     env: []
     selector:
       matchLabels:
         app: order
         env: prod
  - name: inject-java-app-dev
     java: <java-config>
     env: []
     selector:
       matchLabels:
         app: order
         env: dev

This would then inject configuration for the following deployments without requiring any modifications in the deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: order
    env: prod
  name: prod-order
spec:
  ...
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: order
    env: dev
  name: dev-order
spec:
  ...
---

This to me would be closer to what skywalking can accomplish today and would allow cluster admins to entirely separate injection from application developers. Let me know what you think, thank you!

jaronoff97 avatar May 29 '24 13:05 jaronoff97

@jaronoff97 Sorry for the delayed response.

What i'm thinking is a world where a cluster admin like yourself would define an instrumentation like so:

Using a new CRD: InstrumentationRules seems like a good approach. We just need to move the configuration changes from Instrumentation to InstrumentationRules.

One more question: if we use InstrumentationRules, is it still necessary to keep Instrumentation? Or will Instrumentation be removed in a future update?

crossoverJie avatar Jul 04 '24 08:07 crossoverJie

@crossoverJie this is a good question, I think before we can answer it we should have a design in mind for InstrumentationRules to see if it's a superset or distinct (there are tradeoffs for either approach). Would you be able to work on that design? If not, I can see if anyone else is available to do so. (my cmd+enter pressed closed... oops!)

jaronoff97 avatar Jul 15 '24 18:07 jaronoff97

my cmd+enter pressed closed... oops!

Hahaha, no worries.đŸ˜‚

Would you be able to work on that design?

I will draft a proposal for InstrumentationRules as soon as possible. We've been running it in our testing environment for some time with select, and we hope to use it in production soon. The main goal is to stay consistent with the upstream code.

crossoverJie avatar Jul 16 '24 01:07 crossoverJie

apiVersion: opentelemetry.io/v1alpha1
kind: InstrumentationRules
metadata:
  name: instrumentation-prod
  # It will only select the Pods in this namespace.
  namespace: prod
spec:
  # If the rule is not specified, the exporter/propagators/sampler will be used
  exporter:
    endpoint: http://global-otel-collector:4317
  propagators:
    - tracecontext
    - baggage
    - b3
  sampler:
    type: parentbased_traceidratio
    argument: "0.35"

  rules:
    - name: order-service
      java:
        image: ""
        extensions:
          - image: ""
            dir: /extensions
        exporter:
          endpoint: http://prod-otel-collector:4317
        propagators:
          - tracecontext
          - baggage
          - b3
        sampler:
          type: parentbased_traceidratio
          argument: "0.25"
        env:
          - name: OTEL_EXPORTER_OTLP_PROTOCOL
            value: grpc
          - name: OTEL_SERVICE_NAME
            value: order-service
        # Multi-container pods with single instrumentation
        container-names: [order] # If there are multiple containers in the pod, specify the container name here, otherwise it will be use the first container in the pod.
      selector:
        matchLabels:
          app: order
    - name: user-service
      # Multi-container pods with multiple instrumentations
      java:
        env: []
        container-names: [user-java]
      python:
        env: []
        container-names: [user-python]
      selector:
        matchLabels:
          app: user

The purpose of this proposal is to flexibly configure Instrumentation for any application without modifying the Deployment.

Here, a resource named InstrumentationRules CRD is defined, which can implement the functionalities provided by Instrumentation.

Additionally, there are a few issues that need to be confirmed:

  • [ ] Can multiple InstrumentationRules be created?
    • [ ] I prefer to be able to create multiple, as operations can create an InstrumentationRules for each namespace, which makes it easier to maintain data.
  • [ ] In a given namespace, all Pods use the same InstrumentationRule.
    • [ ] I haven't found a particularly good way to declare it. Do you have any suggestions?
  • [ ] Compatibility issues with Instrumentation.
    • [ ] The first version prioritizes Instrumentation, and once the InstrumentationRule functionality is stable, it can replace Instrumentation.

This is just a draft, and I look forward to your reply.@jaronoff97 Thank you.

crossoverJie avatar Jul 17 '24 08:07 crossoverJie

apiVersion: opentelemetry.io/v1alpha1
kind: InstrumentationRules
metadata:
  name: instrumentation-rule-for-namespace
  # It will only select the Pods in this namespace.
  namespace: prod
spec:
  # If usedForNamespace is true, the rule will be used for the namespace.
  usedForNamespace: true
  # If the rule is not specified, the exporter/propagators/sampler will be used
  exporter:
    endpoint: http://global-otel-collector:4317
  propagators:
    - tracecontext
    - baggage
    - b3
  sampler:
    type: parentbased_traceidratio
    argument: "0.35"

  rules:
    # Only one rule is allowed for given namespace.
    - name: used-for-namespace
      java:
        env: []
      python:
        env: []

In a given namespace, all Pods use the same InstrumentationRule.

For this issue, I suggest adding a new field called usedForNamespace. When it is set to true, it indicates that the current instrumentationrules applys to the namespace. In this case, only one rule can exist.

If there are ohter InstrumentationRules applied to Pods at this time, they will have a higher priority.

@jaronoff97 Do you have any other suggestions? If not, I'd like to start developing this feature as soon as possible.

crossoverJie avatar Jul 23 '24 07:07 crossoverJie

cc @swiatekm @iblancasa https://github.com/open-telemetry/opentelemetry-operator/issues/2744#issuecomment-2232789129 https://github.com/open-telemetry/opentelemetry-operator/issues/2744#issuecomment-2244476064

Here is a new proposal regarding select, If you have time, please take a look.

crossoverJie avatar Jul 29 '24 02:07 crossoverJie

@crossoverJie thanks for working on this! I think it would be great if you could write an RFC for this given the functionality will be net new and there will be some edge cases and rollout issues I want to be sure we all understand and agree upon. I can reach out via slack and we can work on it. I'm going to be creating an RFC process and this feels like a great first RFC to write. If you wanted to start on an implementation so we can have something as a proof of concept, that would be great too.

jaronoff97 avatar Aug 01 '24 16:08 jaronoff97

May I know what is the latest update on this? Is this already implemented? Anyone working on this?

iamvishnuks avatar Aug 08 '25 10:08 iamvishnuks

@iamvishnuks I'm planning on working on this sometime in the next couple of months, we've been having a lot of discussion around design considerations, breaking changes, etc. I'm hoping to be writing an RFC soon

jaronoff97 avatar Aug 08 '25 14:08 jaronoff97

@iamvishnuks I'm planning on working on this sometime in the next couple of months, we've been having a lot of discussion around design considerations, breaking changes, etc. I'm hoping to be writing an RFC soon

@jaronoff97 What's the progress on the RFC? If some parts are already complete, can we submit a draft and work together to move it forward?

If you don't have time to handle it for now, I can draft the RFC document.

crossoverJie avatar Oct 30 '25 07:10 crossoverJie

@jaronoff97 Maybe we don't need to modify the CR; we can implement it by referring to this.

        instrumentation.opentelemetry.io/container-names: xx
        instrumentation.opentelemetry.io/inject-java: xx

crossoverJie avatar Nov 11 '25 09:11 crossoverJie