opentelemetry-python
opentelemetry-python copied to clipboard
Generate a service.instance.id resource attribute if it is not present
[
service.instance.id] MUST be unique for each instance of the sameservice.namespace,service.namepair (in other wordsservice.namespace,service.name,service.instance.idtriplet MUST be globally unique). The ID helps to distinguish instances of the same service that exist at the same time (e.g. instances of a horizontally scaled service) [...] If the service has no inherent unique ID that can be used as the value of this attribute it is recommended to generate a random [UUID]
At the moment, just using the default resource in our SDK violates the requirement that "service.namespace,service.name,service.instance.id triplet MUST be globally unique":
In [1]: from opentelemetry.sdk.resources import Resource
In [2]: Resource.create().attributes
Out[2]: BoundedAttributes({'telemetry.sdk.language': 'python', 'telemetry.sdk.name': 'opentelemetry', 'telemetry.sdk.version': '1.5.0', 'service.name': 'unknown_service'}, maxlen=None)
This is very important for metrics where resource is part of the identity of a time series and we must not violate the single-writer principle.
- Should we generate the
service.instance.idfrom a UUID if it is not present? I think yes, so that we do not violate the global uniqueness. - Should we try to automagically populate this with a more meaningful value when it is present in a resource? E.g.
container.idwhen present (this might be a spec question). - Since this is such a common use case, should we provide an easy way for the user to pass some of these key
service.*attributes into the SDK if they don't want it generated? E.g. an environment variable, constructor kwarg to the Tracer/Meter provider, kwarg to the resource.
I like the idea of generating 1 if no service.instance.id is present.
AI for me is to investigate what other SIGs are doing and when this was added to the spec. At SIG, we decided that option 2. would need some spec discussion in SIG so we will skip it for now.
There's some spec discussion here https://github.com/open-telemetry/opentelemetry-specification/issues/1034. One of the things they mentioned is similar to option 2. Let's see if we can get some clarity in the spec.
@aabmass, has there been any new update on this?
Spec discussion https://github.com/open-telemetry/opentelemetry-specification/issues/3136
There has been some updates in the spec regarding this issue since last post in the thread.
Implementations, such as SDKs, are recommended to generate a random Version 1 or Version 4 RFC 4122 UUID, but are free to use an inherent unique ID as the source of this value if stability is desirable. In that case, the ID SHOULD be used as source of a UUID Version 5 and SHOULD use the following UUID as the namespace: 4d63009a-8d0f-11ee-aad7-4c796ed8e320. [...] For applications running behind an application server (like unicorn), we do not recommend using one identifier for all processes participating in the application. Instead, it's recommended each division (e.g. a worker thread in unicorn) to have its own instance.id.
Such an identifier would be a lifesaver for automated differentiation of metric streams from services with a fork-process model.