opentelemetry-dotnet-contrib
opentelemetry-dotnet-contrib copied to clipboard
[Resources.Container] Add Kubernetes support
Fixes #1562. Only tested manually at the moment.
For significant contributions please make sure you have completed the following items:
- [x] Appropriate
CHANGELOG.mdupdated for non-trivial changes - [ ] Design discussion issue #
- [ ] Changes in public API reviewed
Codecov Report
Attention: Patch coverage is 49.43820% with 45 lines in your changes missing coverage. Please review.
Project coverage is 70.89%. Comparing base (
71655ce) to head (5e1af13). Report is 684 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #1699 +/- ##
==========================================
- Coverage 73.91% 70.89% -3.03%
==========================================
Files 267 369 +102
Lines 9615 14227 +4612
==========================================
+ Hits 7107 10086 +2979
- Misses 2508 4141 +1633
Flags with carried forward coverage won't be shown. Click here to find out more.
| Files with missing lines | Coverage Δ | |
|---|---|---|
| ...y.Resources.Container/Models/K8sContainerStatus.cs | 100.00% <100.00%> (ø) |
|
| ...OpenTelemetry.Resources.Container/Models/K8sPod.cs | 100.00% <100.00%> (ø) |
|
| ...lemetry.Resources.Container/Models/K8sPodStatus.cs | 100.00% <100.00%> (ø) |
|
| ...esources.Container/ContainerResourceEventSource.cs | 21.42% <21.42%> (ø) |
|
| ...Telemetry.Resources.Container/ContainerDetector.cs | 74.66% <72.54%> (ø) |
|
| ...elemetry.Resources.Container/K8sMetadataFetcher.cs | 0.00% <0.00%> (ø) |
This PR was marked stale due to lack of activity. It will be closed in 7 days.
Not stale. I'll try to finish review this week.
Similar work happens in opentelemtry-js: open-telemetry/opentelemetry-js-contrib#1962 There are still few open questions:
- there is no standard way to get container name. It should be provided by customer - most probably on constructor. We may try to use some environment variable as fallback option. Javascript implementation currently suggest usage of
CONTAINER_NAMEenvironment variable. This PR use KUBERNETES_CONTAINER_NAME environment varaible. ContainerName is not available with downwards API, but can be provided using yaml anchors or manually - there is no standard way to pod name. It should be provided by customer - most probably on constructor. We may try to use
HOSTNAMEenvironment variable as fallback option. Javascript implementation currently suggest usage ofPOD_NAMEenvironment variable. This PR usesHOSTNAMEenvironment variable for it. It may not work on some environments of if user override hostname in PodSpec. PodName can be available through environment variable with downwards API. - Namespace is available be default in
/var/run/secrets/kubernetes.io/serviceaccount/namespacefile. It can be also provided by downwards API, though reading it from file I like more
It would be good if we synchronize fallback environment variable with JS implementation
To make it work, customer should provide get permission on pods object. We should extend readme showing how to make it working. I was able to make it functional with next sample:
apiVersion: v1
kind: ServiceAccount
metadata:
name: pod-reader-account
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: default
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
namespace: default
subjects:
- kind: ServiceAccount
name: pod-reader-account
namespace: default
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: Pod
metadata:
name: container-resolve-demo
spec:
serviceAccountName: pod-reader-account
volumes:
- name: shared-data
emptyDir: {}
containers:
- name: &container_name my_container_name
image: ubuntu:latest
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
env:
- name: KUBERNETES_CONTAINER_NAME
value: *container_name
- name: KUBERNETES_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
And the hardest part would be a unit test / integration for it. We can move reading environment variables / files to separate class, so that we would be able to mock it. After it we can start ASP.NET web service and use it to validate that request correctly send / parsed. It may be much more work to make it use custom certificate, so it can probably be skipped. Do you have other ideas on easier tests?
- there is no standard way to get container name. It should be provided by customer - most probably on constructor. We may try to use some environment variable as fallback option. Javascript implementation currently suggest usage of
CONTAINER_NAMEenvironment variable. This PR use KUBERNETES_CONTAINER_NAME environment varaible. ContainerName is not available with downwards API, but can be provided using yaml anchors or manually
The PR in opentelemetry-js has not yet been approved. I think it's better to prefix with KUBERNETES_ to follow the Kubernetes naming convention. What do you think?
- there is no standard way to pod name. It should be provided by customer - most probably on constructor. We may try to use
HOSTNAMEenvironment variable as fallback option. Javascript implementation currently suggest usage ofPOD_NAMEenvironment variable. This PR usesHOSTNAMEenvironment variable for it. It may not work on some environments of if user override hostname in PodSpec. PodName can be available through environment variable with downwards API.
I added KUBERNETES_POD_NAME and a fallback to HOSTNAME if not provided.
- Namespace is available be default in
/var/run/secrets/kubernetes.io/serviceaccount/namespacefile. It can be also provided by downwards API, though reading it from file I like more
I removed KUBERNETES_NAMESPACE and I'm now reading the namespace from this file. Thank you for the suggestion!
To make it work, customer should provide get permission on pods object. We should extend readme showing how to make it working. I was able to make it functional with next sample:
I added some details in the README explaining what permission is needed to use this detector in a Kubernetes environment.
Detectors always have a parameter-less constructor until now. Any suggestion?
HostDetector has an internal constructor to facilitate testing
Please validate that container name was defined by user and fail early if it was not.
Instead of
KUBERNETES_SERVICE_HOST, right?
You can do it either instead of or in addition. Both have some sense.
Logic / code wise it looks good to me. But I fully support concerns of @Kielek about establishing and following standards across OTEL implementation. And it is tricky for that logic.
We are working on container resource detector package. Talking in terms defined in container and k8s resources, in this PR logic to extract container.id in Kuberentes environment was implemented. To do it, we need to know: k8s.container.name, k8s.namespace.name, and k8s.pod.name. There is well defined way to extract k8s.namespace.name, semi-working way to resolve k8s.pod.name and no way to resolve k8s.container.name. To overcome that, we suggest use user-defined environment variables KUBERNETES_CONTAINER_NAME and KUBERNETES_POD_NAME (with advices how to configure it in README.md.
There are attributes that "MAY be configurable via a dedicated environment variable" per resources and variables list, but all OTEL used variables use OTEL_ prefix.
Should that package be responsible for both cgroup containers and k8s container? Or should it be separate packages? Is it ok for detector to identify resource in another category? How detectors / detector package can reuse data collected by another resource detector?
SKD spec suggests:
Custom resource detectors related to generic platforms (e.g. Docker, Kubernetes) or vendor specific environments (e.g. EKS, AKS, GKE) MUST be implemented as packages separate from the SDK. Resource detector packages MAY detect resource information from multiple possible sources and merge the result using the Merge operation described above.
With all of that: @Kielek, do you think we can merge it in current state (we still have pre-release of the package), and open issues here and semantic-conventions repo to discuss it? If not, I can suggest to create special user-provided configuration to enable K8s container resolution and do not resolve it from environment variable internally. Something that would be used like
var tracerProvider = Sdk.CreateTracerProviderBuilder()
// other configurations
.ConfigureResource(resource => resource
.AddContainerDetector(
settings => settings.EnableK8sResolution(
Environment.GetVaraible("KUBERNETES_CONTAINER_NAME"),
Environment.GetVaraible("KUBERNETES_POD_NAME")))
.Build();
As this is something that can already be done by collector, as noted in this comment, I think it'd be better to have consensus on this approach in broader opentelemetry community, as suggested by @Kielek, before merging this PR. I assume scenario with applications running in k8s cluster sending telemetry directly to the backend (e.g without collector deployed) is not that common.
If I read k8s processor code correct, it in fact set container id if it was not set. For it, AttributeK8SContainerName should be set. Same is documented.
I have an idea how we can finish this PR and still be in full compliance with already approved semantic conventions and idea behind k8s processor implementation. Though it will have one small hack (that can be improved later once proper API will be provided).
K8s processor imply that user will provide k8s.container.name to be able to resolve container id. And user can do it already, using OTEL_RESOURCE_ATTRIBUTES environment variable. All it need to do is to add proper pairs there. Something like:
apiVersion: v1
kind: Pod
metadata:
name: container-resolve-demo
spec:
serviceAccountName: pod-reader-account
volumes:
- name: shared-data
emptyDir: {}
containers:
- name: &container_name my-container-name
image: ubuntu:latest
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
env:
- name: OTEL_RESOURCE_ATTRIBUTES_K8S_CONTAINER_NAME
value: *container_name
- name: OTEL_RESOURCE_ATTRIBUTES_K8S_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: OTEL_RESOURCE_ATTRIBUTES
value: k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_K8S_POD_NAME),k8s.container.name=$(OTEL_RESOURCE_ATTRIBUTES_K8S_CONTAINER_NAME)
Now, instead of accessing some environment variables, we just need to get attributes from OTEL_RESOURCE_ATTRIBUTES - effectively use one resource detector in another one. I have not seen it anywhere, but we can do something like:
internal static class K8sSemanticConventions
{
public const string AttributePodName = "k8s.pod.name";
public const string AttributeContainerName = "k8s.container.name";
}
private string? ExtractK8sContainerId()
{
// Is there any way to do it better?
var environmentVariableAttributes = ResourceBuilder.CreateEmpty()
.AddEnvironmentVariableDetector().Build().Attributes;
var podName = environmentVariableAttributes
.FirstOrDefault(atr => atr.Key == K8sSemanticConventions.AttributeContainerName)
.Value?.ToString();
var containerName = environmentVariableAttributes
.FirstOrDefault(atr => atr.Key == K8sSemanticConventions.AttributePodName)
.Value?.ToString();
...
}
And user can do it already, using OTEL_RESOURCE_ATTRIBUTES environment variable.
With this approach, the pod name and container name will be part of the Resource (as it's quite common to enable the env variable detector) and will be included in all traces/metrics/logs, which is not necessarily the desired behavior. 😕
@joegoldman2, currently I see several options:
- start new discussion in opentelemetry-specification about new environment variable
- make customer pass container name manually as parameter
- use container name as already specified resource
- move K8s logic to separate detector that would require container name on constructor (and somehow make container detector never override container-id detected by controller)
- do nothing, suggesting to use processor feature on collector (not everyone use collector!)
Or some combination of that approaches. I don't like last option (do nothing).
This PR was marked stale due to lack of activity. It will be closed in 7 days.
@iskiselev I'm not really sure what to do next? Do you want to initiate a new discussion in opentelemetry-specification about new environment variables?
This PR was marked stale due to lack of activity. It will be closed in 7 days.
This PR was marked stale due to lack of activity. It will be closed in 7 days.
Not stale. open-telemetry/opentelemetry-specification#4140 opened to discuss next step. @joegoldman2, if new config environment variables will be approved, we will still need to make it configurable though factory method parameters per: https://github.com/open-telemetry/opentelemetry-specification/blob/27bc9ffad2b386626367dd2f98d0ba2ddf35a18f/specification/configuration/sdk-environment-variables.md?plain=1#L54
Ok, so I guess the next step is to define the desired API to allow users to set the values programmatically, right?
@joegoldman2, I already suggested API at https://github.com/open-telemetry/opentelemetry-dotnet-contrib/pull/1699#issuecomment-2164160848, but any would works once it allows to specify values. I'm not sure, whether we will see any move open-telemetry/opentelemetry-specification#4140. With program API we should be able to merge this PR even without environment values support if consensus would not be reached. Btw, do you see an option to merge it already (with @Kielek approval) or is it requires anything else? I still fully agree with @Kielek , that we should either get common agreement on environment values usage and document them - or do not use them at all.
And looks like new environment variable would not be added any time soon: https://github.com/open-telemetry/opentelemetry-specification/issues/2891#issuecomment-1289241503. So, using API-based values may be the only option.
Thank you for the API proposal https://github.com/open-telemetry/opentelemetry-dotnet-contrib/pull/1699#issuecomment-2164160848. What is the procedure to review/validate a new API?
@joegoldman2, I actually don't know what should be the procedure to review/validate a new API. From my point of view (I'm curently codeowner for src/OpenTelemetry.Resources.Container), anything that will give a way to provide required values from code should work, as it will be in compliance with specification. As it will be isolated as arguments of AddContainerDetector method, we probably don't need any wider audience review. Alternatvely we can try to visit Tuesday meeting and seek opinions / advices there.
BTW, we just finished validation of using k8sattributes processor on collector, and it perfectly solves the problem of resovling container.id for k8s cluster (once k8s.container.name attibute added).
This PR was marked stale due to lack of activity. It will be closed in 7 days.
This PR was marked stale due to lack of activity. It will be closed in 7 days.
@joegoldman2, we published instruction how to use K8 attribute processor for resolveing container id in OTEL collector: https://docs.appdynamics.com/observability/cisco-cloud-observability/en/application-performance-monitoring/monitor-applications-in-kubernetes/infrastructure-correlation#InfrastructureCorrelation-KubernetesAttributeProcessorConfiguredontheOpenTelemetryCollector
This PR was marked stale due to lack of activity. It will be closed in 7 days.
This PR was marked stale due to lack of activity. It will be closed in 7 days.