opentelemetry-dotnet-contrib icon indicating copy to clipboard operation
opentelemetry-dotnet-contrib copied to clipboard

[Resources.Container] Add Kubernetes support

Open joegoldman2 opened this issue 1 year ago • 31 comments

Fixes #1562. Only tested manually at the moment.

For significant contributions please make sure you have completed the following items:

  • [x] Appropriate CHANGELOG.md updated for non-trivial changes
  • [ ] Design discussion issue #
  • [ ] Changes in public API reviewed

joegoldman2 avatar Apr 27 '24 09:04 joegoldman2

Codecov Report

Attention: Patch coverage is 49.43820% with 45 lines in your changes missing coverage. Please review.

Project coverage is 70.89%. Comparing base (71655ce) to head (5e1af13). Report is 684 commits behind head on main.

Files with missing lines Patch % Lines
...elemetry.Resources.Container/K8sMetadataFetcher.cs 0.00% 20 Missing :warning:
...Telemetry.Resources.Container/ContainerDetector.cs 72.54% 14 Missing :warning:
...esources.Container/ContainerResourceEventSource.cs 21.42% 11 Missing :warning:
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1699      +/-   ##
==========================================
- Coverage   73.91%   70.89%   -3.03%     
==========================================
  Files         267      369     +102     
  Lines        9615    14227    +4612     
==========================================
+ Hits         7107    10086    +2979     
- Misses       2508     4141    +1633     
Flag Coverage Δ
unittests-Contrib.Shared.Tests 84.04% <ø> (?)
unittests-Exporter.Geneva 56.67% <ø> (?)
unittests-Exporter.InfluxDB 93.92% <ø> (?)
unittests-Exporter.Instana 74.86% <ø> (?)
unittests-Exporter.OneCollector 94.50% <ø> (?)
unittests-Exporter.Stackdriver 79.26% <ø> (?)
unittests-Extensions 88.63% <ø> (?)
unittests-Extensions.Enrichment 100.00% <ø> (?)
unittests-Instrumentation.AWS 86.39% <ø> (?)
unittests-Instrumentation.AspNetCore 70.33% <ø> (?)
unittests-Instrumentation.ConfluentKafka 14.37% <ø> (?)
unittests-Instrumentation.ElasticsearchClient 80.12% <ø> (?)
unittests-Instrumentation.EntityFrameworkCore 57.06% <ø> (?)
unittests-Instrumentation.EventCounters 76.36% <ø> (?)
unittests-Instrumentation.GrpcCore 91.42% <ø> (?)
unittests-Instrumentation.GrpcNetClient 79.61% <ø> (?)
unittests-Instrumentation.Hangfire 93.58% <ø> (?)
unittests-Instrumentation.Http 74.33% <ø> (?)
unittests-Instrumentation.Owin 88.12% <ø> (?)
unittests-Instrumentation.Process 100.00% <ø> (?)
unittests-Instrumentation.Quartz 78.76% <ø> (?)
unittests-Instrumentation.Runtime 100.00% <ø> (?)
unittests-Instrumentation.SqlClient 88.38% <ø> (?)
unittests-Instrumentation.StackExchangeRedis 68.26% <ø> (?)
unittests-Instrumentation.Wcf 49.55% <ø> (?)
unittests-PersistentStorage 65.88% <ø> (?)
unittests-Resources.AWS 75.60% <ø> (?)
unittests-Resources.Azure 84.56% <ø> (?)
unittests-Resources.Container 55.17% <49.43%> (?)
unittests-Resources.Gcp 71.15% <ø> (?)
unittests-Resources.Host 73.91% <ø> (?)
unittests-Resources.OperatingSystem 76.98% <ø> (?)
unittests-Resources.Process 100.00% <ø> (?)
unittests-Resources.ProcessRuntime 78.26% <ø> (?)
unittests-Sampler.AWS 88.12% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...y.Resources.Container/Models/K8sContainerStatus.cs 100.00% <100.00%> (ø)
...OpenTelemetry.Resources.Container/Models/K8sPod.cs 100.00% <100.00%> (ø)
...lemetry.Resources.Container/Models/K8sPodStatus.cs 100.00% <100.00%> (ø)
...esources.Container/ContainerResourceEventSource.cs 21.42% <21.42%> (ø)
...Telemetry.Resources.Container/ContainerDetector.cs 74.66% <72.54%> (ø)
...elemetry.Resources.Container/K8sMetadataFetcher.cs 0.00% <0.00%> (ø)

... and 382 files with indirect coverage changes

codecov[bot] avatar Apr 27 '24 09:04 codecov[bot]

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar May 15 '24 03:05 github-actions[bot]

Not stale. I'll try to finish review this week.

iskiselev avatar May 15 '24 03:05 iskiselev

Similar work happens in opentelemtry-js: open-telemetry/opentelemetry-js-contrib#1962 There are still few open questions:

  • there is no standard way to get container name. It should be provided by customer - most probably on constructor. We may try to use some environment variable as fallback option. Javascript implementation currently suggest usage of CONTAINER_NAME environment variable. This PR use KUBERNETES_CONTAINER_NAME environment varaible. ContainerName is not available with downwards API, but can be provided using yaml anchors or manually
  • there is no standard way to pod name. It should be provided by customer - most probably on constructor. We may try to use HOSTNAME environment variable as fallback option. Javascript implementation currently suggest usage of POD_NAME environment variable. This PR uses HOSTNAME environment variable for it. It may not work on some environments of if user override hostname in PodSpec. PodName can be available through environment variable with downwards API.
  • Namespace is available be default in /var/run/secrets/kubernetes.io/serviceaccount/namespace file. It can be also provided by downwards API, though reading it from file I like more

It would be good if we synchronize fallback environment variable with JS implementation

To make it work, customer should provide get permission on pods object. We should extend readme showing how to make it working. I was able to make it functional with next sample:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: pod-reader-account
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: pod-reader
rules:
- apiGroups: [""] 
  resources: ["pods"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: default
subjects:
- kind: ServiceAccount
  name: pod-reader-account
  namespace: default
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: Pod
metadata:
  name: container-resolve-demo
spec:
  serviceAccountName: pod-reader-account
  volumes:
  - name: shared-data
    emptyDir: {}
  containers:
  - name: &container_name my_container_name
    image: ubuntu:latest
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]
    env:
    - name: KUBERNETES_CONTAINER_NAME
      value: *container_name
    - name: KUBERNETES_POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name

iskiselev avatar May 23 '24 05:05 iskiselev

And the hardest part would be a unit test / integration for it. We can move reading environment variables / files to separate class, so that we would be able to mock it. After it we can start ASP.NET web service and use it to validate that request correctly send / parsed. It may be much more work to make it use custom certificate, so it can probably be skipped. Do you have other ideas on easier tests?

iskiselev avatar May 23 '24 05:05 iskiselev

  • there is no standard way to get container name. It should be provided by customer - most probably on constructor. We may try to use some environment variable as fallback option. Javascript implementation currently suggest usage of CONTAINER_NAME environment variable. This PR use KUBERNETES_CONTAINER_NAME environment varaible. ContainerName is not available with downwards API, but can be provided using yaml anchors or manually

The PR in opentelemetry-js has not yet been approved. I think it's better to prefix with KUBERNETES_ to follow the Kubernetes naming convention. What do you think?

  • there is no standard way to pod name. It should be provided by customer - most probably on constructor. We may try to use HOSTNAME environment variable as fallback option. Javascript implementation currently suggest usage of POD_NAME environment variable. This PR uses HOSTNAME environment variable for it. It may not work on some environments of if user override hostname in PodSpec. PodName can be available through environment variable with downwards API.

I added KUBERNETES_POD_NAME and a fallback to HOSTNAME if not provided.

  • Namespace is available be default in /var/run/secrets/kubernetes.io/serviceaccount/namespace file. It can be also provided by downwards API, though reading it from file I like more

I removed KUBERNETES_NAMESPACE and I'm now reading the namespace from this file. Thank you for the suggestion!

To make it work, customer should provide get permission on pods object. We should extend readme showing how to make it working. I was able to make it functional with next sample:

I added some details in the README explaining what permission is needed to use this detector in a Kubernetes environment.

joegoldman2 avatar May 23 '24 06:05 joegoldman2

Detectors always have a parameter-less constructor until now. Any suggestion?

HostDetector has an internal constructor to facilitate testing

matt-hensley avatar May 23 '24 20:05 matt-hensley

Please validate that container name was defined by user and fail early if it was not.

Instead of KUBERNETES_SERVICE_HOST, right?

You can do it either instead of or in addition. Both have some sense.

iskiselev avatar May 25 '24 19:05 iskiselev

Logic / code wise it looks good to me. But I fully support concerns of @Kielek about establishing and following standards across OTEL implementation. And it is tricky for that logic.

We are working on container resource detector package. Talking in terms defined in container and k8s resources, in this PR logic to extract container.id in Kuberentes environment was implemented. To do it, we need to know: k8s.container.name, k8s.namespace.name, and k8s.pod.name. There is well defined way to extract k8s.namespace.name, semi-working way to resolve k8s.pod.name and no way to resolve k8s.container.name. To overcome that, we suggest use user-defined environment variables KUBERNETES_CONTAINER_NAME and KUBERNETES_POD_NAME (with advices how to configure it in README.md.

There are attributes that "MAY be configurable via a dedicated environment variable" per resources and variables list, but all OTEL used variables use OTEL_ prefix.

Should that package be responsible for both cgroup containers and k8s container? Or should it be separate packages? Is it ok for detector to identify resource in another category? How detectors / detector package can reuse data collected by another resource detector?

SKD spec suggests:

Custom resource detectors related to generic platforms (e.g. Docker, Kubernetes) or vendor specific environments (e.g. EKS, AKS, GKE) MUST be implemented as packages separate from the SDK. Resource detector packages MAY detect resource information from multiple possible sources and merge the result using the Merge operation described above.

With all of that: @Kielek, do you think we can merge it in current state (we still have pre-release of the package), and open issues here and semantic-conventions repo to discuss it? If not, I can suggest to create special user-provided configuration to enable K8s container resolution and do not resolve it from environment variable internally. Something that would be used like

var tracerProvider = Sdk.CreateTracerProviderBuilder()
                        // other configurations
                        .ConfigureResource(resource => resource
                            .AddContainerDetector(
                               settings => settings.EnableK8sResolution(
                                 Environment.GetVaraible("KUBERNETES_CONTAINER_NAME"),
                                 Environment.GetVaraible("KUBERNETES_POD_NAME")))
                        .Build();

iskiselev avatar Jun 13 '24 01:06 iskiselev

As this is something that can already be done by collector, as noted in this comment, I think it'd be better to have consensus on this approach in broader opentelemetry community, as suggested by @Kielek, before merging this PR. I assume scenario with applications running in k8s cluster sending telemetry directly to the backend (e.g without collector deployed) is not that common.

lachmatt avatar Jun 13 '24 11:06 lachmatt

If I read k8s processor code correct, it in fact set container id if it was not set. For it, AttributeK8SContainerName should be set. Same is documented.

iskiselev avatar Jun 13 '24 21:06 iskiselev

I have an idea how we can finish this PR and still be in full compliance with already approved semantic conventions and idea behind k8s processor implementation. Though it will have one small hack (that can be improved later once proper API will be provided).

K8s processor imply that user will provide k8s.container.name to be able to resolve container id. And user can do it already, using OTEL_RESOURCE_ATTRIBUTES environment variable. All it need to do is to add proper pairs there. Something like:

apiVersion: v1
kind: Pod
metadata:
  name: container-resolve-demo
spec:
  serviceAccountName: pod-reader-account
  volumes:
  - name: shared-data
    emptyDir: {}
  containers:
  - name: &container_name my-container-name
    image: ubuntu:latest
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]
    env:
    - name: OTEL_RESOURCE_ATTRIBUTES_K8S_CONTAINER_NAME
      value: *container_name
    - name: OTEL_RESOURCE_ATTRIBUTES_K8S_POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: OTEL_RESOURCE_ATTRIBUTES
      value: k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_K8S_POD_NAME),k8s.container.name=$(OTEL_RESOURCE_ATTRIBUTES_K8S_CONTAINER_NAME)

Now, instead of accessing some environment variables, we just need to get attributes from OTEL_RESOURCE_ATTRIBUTES - effectively use one resource detector in another one. I have not seen it anywhere, but we can do something like:

internal static class K8sSemanticConventions
{
    public const string AttributePodName = "k8s.pod.name";
    public const string AttributeContainerName = "k8s.container.name";
}

private string? ExtractK8sContainerId()
{
  // Is there any way to do it better?
  var environmentVariableAttributes = ResourceBuilder.CreateEmpty()
          .AddEnvironmentVariableDetector().Build().Attributes;
  var podName = environmentVariableAttributes
      .FirstOrDefault(atr => atr.Key == K8sSemanticConventions.AttributeContainerName)
      .Value?.ToString();
  var containerName = environmentVariableAttributes
      .FirstOrDefault(atr => atr.Key == K8sSemanticConventions.AttributePodName)
      .Value?.ToString();
  ...
}

iskiselev avatar Jun 13 '24 22:06 iskiselev

And user can do it already, using OTEL_RESOURCE_ATTRIBUTES environment variable.

With this approach, the pod name and container name will be part of the Resource (as it's quite common to enable the env variable detector) and will be included in all traces/metrics/logs, which is not necessarily the desired behavior. 😕

joegoldman2 avatar Jun 14 '24 03:06 joegoldman2

@joegoldman2, currently I see several options:

  • start new discussion in opentelemetry-specification about new environment variable
  • make customer pass container name manually as parameter
  • use container name as already specified resource
  • move K8s logic to separate detector that would require container name on constructor (and somehow make container detector never override container-id detected by controller)
  • do nothing, suggesting to use processor feature on collector (not everyone use collector!)

Or some combination of that approaches. I don't like last option (do nothing).

iskiselev avatar Jun 14 '24 06:06 iskiselev

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar Jun 22 '24 03:06 github-actions[bot]

@iskiselev I'm not really sure what to do next? Do you want to initiate a new discussion in opentelemetry-specification about new environment variables?

joegoldman2 avatar Jun 22 '24 10:06 joegoldman2

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar Jul 06 '24 03:07 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar Jul 14 '24 03:07 github-actions[bot]

Not stale. open-telemetry/opentelemetry-specification#4140 opened to discuss next step. @joegoldman2, if new config environment variables will be approved, we will still need to make it configurable though factory method parameters per: https://github.com/open-telemetry/opentelemetry-specification/blob/27bc9ffad2b386626367dd2f98d0ba2ddf35a18f/specification/configuration/sdk-environment-variables.md?plain=1#L54

iskiselev avatar Jul 14 '24 04:07 iskiselev

Ok, so I guess the next step is to define the desired API to allow users to set the values programmatically, right?

joegoldman2 avatar Jul 15 '24 16:07 joegoldman2

@joegoldman2, I already suggested API at https://github.com/open-telemetry/opentelemetry-dotnet-contrib/pull/1699#issuecomment-2164160848, but any would works once it allows to specify values. I'm not sure, whether we will see any move open-telemetry/opentelemetry-specification#4140. With program API we should be able to merge this PR even without environment values support if consensus would not be reached. Btw, do you see an option to merge it already (with @Kielek approval) or is it requires anything else? I still fully agree with @Kielek , that we should either get common agreement on environment values usage and document them - or do not use them at all.

iskiselev avatar Jul 15 '24 20:07 iskiselev

And looks like new environment variable would not be added any time soon: https://github.com/open-telemetry/opentelemetry-specification/issues/2891#issuecomment-1289241503. So, using API-based values may be the only option.

iskiselev avatar Jul 15 '24 20:07 iskiselev

Thank you for the API proposal https://github.com/open-telemetry/opentelemetry-dotnet-contrib/pull/1699#issuecomment-2164160848. What is the procedure to review/validate a new API?

joegoldman2 avatar Jul 17 '24 10:07 joegoldman2

@joegoldman2, I actually don't know what should be the procedure to review/validate a new API. From my point of view (I'm curently codeowner for src/OpenTelemetry.Resources.Container), anything that will give a way to provide required values from code should work, as it will be in compliance with specification. As it will be isolated as arguments of AddContainerDetector method, we probably don't need any wider audience review. Alternatvely we can try to visit Tuesday meeting and seek opinions / advices there.

iskiselev avatar Jul 27 '24 01:07 iskiselev

BTW, we just finished validation of using k8sattributes processor on collector, and it perfectly solves the problem of resovling container.id for k8s cluster (once k8s.container.name attibute added).

iskiselev avatar Jul 27 '24 01:07 iskiselev

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar Aug 03 '24 03:08 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar Aug 25 '24 03:08 github-actions[bot]

@joegoldman2, we published instruction how to use K8 attribute processor for resolveing container id in OTEL collector: https://docs.appdynamics.com/observability/cisco-cloud-observability/en/application-performance-monitoring/monitor-applications-in-kubernetes/infrastructure-correlation#InfrastructureCorrelation-KubernetesAttributeProcessorConfiguredontheOpenTelemetryCollector

iskiselev avatar Aug 29 '24 18:08 iskiselev

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar Sep 06 '24 03:09 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar Oct 03 '24 03:10 github-actions[bot]