opentelemetry-specification
opentelemetry-specification copied to clipboard
Support map values and nested values for attributes
After https://github.com/open-telemetry/opentelemetry-specification/pull/368 gets merged we will have support for array values.
If we add support for maps and nesting it will allow to represent arbitrary nested data structures in attribute values if needed.
This will apply to span and resource attributes.
Can we explicitly state that this applies to resources too? I believe that span attributes and resources (which are only specified in the proto, currently) are specified with the same structure.
Are there any use cases for arbitrary nesting? I think (multi)maps would be useful to store, e.g., HTTP headers, but what would be the rationale for arbitrary nesting?
Arbitrary nesting map can represent the classified values e.g. {"http" : {"url":...,"method":...}} or {"sql" : {"query":...,"engine":...}}. It can also host vendor specific data like {"aws": {"account_id":...}} in the situations when Resource isn't the right place, e.g. client side metrics
In #579, Tigran's example seems to contain a use-case. The resource of "application B" is a set of key-value attributes.
Semantically, I agree that the dotted string notation is equivalent to a map, although I'd like to point out, that at least from the tracing client perspective, dotted strings are a slightly more efficient representation.
Consider the following representations for the key-value pair 'http.method': 'GET'.
Dotted string representation
{ 'http.method': 'GET }
To represent this we need 1 map and 2 strings; 3 total objects.
Map representation
{ { 'http' : { 'method': 'get' } }
This requires 2 maps, 3 strings; 5 total objects.
Furthermore, most tracing backends do not support nested attributes (as far as I know), and will need to flatten them into dotted strings. This is something that will either have to been done in the tracing clients during export, or by the backends on ingest.
While I recognize that this does have some advantages in regards to semantics and for the data that can be represented, it does introduce complexity into tracing clients and backends. I'm not saying we shouldn't pursue this proposal, but we should discuss what the actual benefits are, and whether the added complexity is worth the tradeoff.
I'd also like to add that with the dotted-string notation, tracing clients can reduce the runtime string allocations to 0 for attribute keys by introducing constants for semantic conventions (and any other commonly used keys). We would lose this ability by changing to nested maps.
I should also clarify that I am completely ok with array support. It's the nested map support that I have reservations about.
@bogdandrutu can you please clarify why is this reopened?
@tigrannajaryan because of the last week discussion and concerns raised by @mwear
Was closing it even intentional? I can't remember any final decision in this issue. At least it's not documented here?
I guess closing this via a commit into some personal repository (tigrannajaryan/exp-otelproto@1507a2f) was accidental? It's surprising to me that GitHub allows this (I would even be tempted to call that a bug).
I guess closing this via a commit into some personal repository (tigrannajaryan/exp-otelproto@1507a2f) was accidental? It's surprising to me that GitHub allows this (I would even be tempted to call that a bug).
Definitely not intentional. I don't understand why does a commit in my personal repo which is not even a fork of this one result in closing an issue here. That's weird. Agree it appears to be a bug.
To clarify: are nested arrays allowed, e.g. [][]int, or only single dimensional arrays e.g. []int
Currently only homogeneous, single-dimensional arrays of primitives are allowed. This issue is about relaxing that. EDIT: Discussion is happening mostly on https://github.com/open-telemetry/opentelemetry-specification/pull/596 recently.
Currently only homogeneous, single-dimensional arrays of primitives are allowed. This issue is about relaxing that.
Great. So my linked PR at the moment meets the spec, but won't once we relax the criteria.
I maintain we should not relax the criteria, but see #596. This has also been discussed in a few SIG Spec meetings.
Corresponding PR didn't recieve a single approve. https://github.com/open-telemetry/opentelemetry-specification/pull/596#issuecomment-673883652 Meaning nobody is seeking for this feature. Moving this issue to Future milestone.
For the record for future discussions: map values and nested values are currently supported at the OTLP protocol level, but are not utilized by the OpenTelemetry API. This issue is to discuss whether we also want to support such values in the API.
Given map values and nested values are supported at the OTLP protocol level, should we treat the current spec of MUST for only primitives and homogeneous lists to be the base requirement and potentially add a MAY for maps and nested values? The expanded types have been working well at a trace level, so strictly implementing the spec would require us to add validations and restrict inputs, which I'm hesitant to add when it's working and we remove the need for added operations.
https://github.com/open-telemetry/opentelemetry-erlang/issues/111
This issue is missing specific use cases / requirements. Here's one: capturing all http headers of a request.
I have a couple of questions/observations regarding the HTTP headers use case:
- If we allow adding nested structures/maps to
Attributes, what type should the attribute key represent? AlsoAttributes? E.g. in JavaAttributeKey<Attributes> attributesKey(String key)would be the factory method - How do we serialize nested structures in non-OTLP formats? Currently in Jaeger & Zipkin exporters we're serializing arrays as JSON lists (
["a","b","c"]), by analogy should we serialize nested structures as JSON objects? Or should flatten them like describes in this comment https://github.com/open-telemetry/opentelemetry-specification/issues/376#issuecomment-643488960 ? - Now that
Attributesare also used in the metrics API, what would happen if you passed a nestedAttributesinstance to a meter? Since they're not allowed for metrics
If we allow adding nested structures/maps to Attributes, what type should the attribute key represent? Also Attributes? E.g. in Java AttributeKey<Attributes> attributesKey(String key) would be the factory method
That is a very Java specific problem and the Java SIG should reach consensus.
How do we serialize nested structures in non-OTLP formats? Currently in Jaeger & Zipkin exporters we're serializing arrays as JSON lists (["a","b","c"]), by analogy should we serialize nested structures as JSON objects? Or should flatten them like describes in this comment Support map values and nested values for attributes #376 (comment) ?
JSON is the simplest way. I think we should not enter the business of flattening. Also each protocol can define their ways of doing this, we care mostly about Jaeger and Zipkin.
Now that Attributes are also used in the metrics API, what would happen if you passed a nested Attributes instance to a meter? Since they're not allowed for metrics
They will continue to not be allowed for the moment
If we allow adding nested structures/maps to Attributes, what type should the attribute key represent? Also Attributes? E.g. in Java AttributeKey attributesKey(String key) would be the factory method
That is a very Java specific problem and the Java SIG should reach consensus.
I don't think this is Java-specific at all. This is a problem for all SIGs that want to strongly type their attributes and also for the semantic convention generator. What should the type of the attribute be declared as? A map<string, string>? That's our first complex type (arrays are currently coded by explicitly listing all possible array types string[], double[] etc.). Do we need arbitrary nesting or just maps from atomic primitive to atomic primitive or array thereof?
I think we need to understand that while allowing maps seems conceptually very simple, and also a no-op in OTLP, I think it opens a can of worms regarding tooling.
Also, I know of no backend that supports maps for tracing. @mwear also brought up performance concerns above in https://github.com/open-telemetry/opentelemetry-specification/issues/376#issuecomment-643488960 (and received many upvotes on that comment).
I don't think this is Java-specific at all. This is a problem for all SIGs that want to strongly type their attributes and also for the semantic convention generator. What should the type of the attribute be declared as? A
map<string, string>? That's our first complex type (arrays are currently coded by explicitly listing all possible array typesstring[],double[]etc.). Do we need arbitrary nesting or just maps from atomic primitive to atomic primitive or array thereof?
Exactly - for the HTTP headers use case the desired type would be map<string, string[]>, but what about other uses? Should we have to add map<string, string> too? Or map<string, ?>, map<int, string> etc. Or just say that we support nesting Attributes (which is sort of a map) inside Attributes.
I think we need to understand that while allowing maps seems conceptually very simple, and also a no-op in OTLP, I think it opens a can of worms regarding tooling.
Also, I know of no backend that supports maps for tracing. @mwear also brought up performance concerns above in #376 (comment) (and received many upvotes on that comment).
So the question here is: is it really worthwhile to introduce map-valued/nested attributes? Do we have any other use case than HTTP attributes? Because if it's only that one use case the cost of implementing that seems to be pretty high (new attribute key type support + exporter changes in all language SIGs, collector included; backend support).
FYI, OTLP and Collector define the attributes field as map<string,AnyValue>, where AnyValue is one of the supported value types, including a map, so it is a recursive definition. (Technically the map is a represented as key/value list in OTLP wire protocol, but the semantics are of a map).
Because if it's only that one use case the cost of implementing that seems to be pretty high (new attribute key type support + exporter changes in all language SIGs, collector included; backend support).
No changes are needed in the Collector, it is already fully supported.
No changes are needed in the Collector, it is already fully supported.
How does the collector handle exporting a nested map to Jaeger / Zipkin? Does it already JSON-encode?
Also, I know of no backend that supports maps for tracing.
From what I understand AS X-Ray does: https://docs.aws.amazon.com/xray/latest/devguide/xray-concepts.html#xray-concepts-annotations
From the above link:
Metadata are key-value pairs with values of any type, including objects and lists
No changes are needed in the Collector, it is already fully supported.
How does the collector handle exporting a nested map to Jaeger / Zipkin? Does it already JSON-encode?
Yes. Jaeger translation calls AsString() which converts maps and arrays to JSON string.
I didn't check Zipkin, it should do the same.
Also, I know of no backend that supports maps for tracing.
From what I understand AS X-Ray does: https://docs.aws.amazon.com/xray/latest/devguide/xray-concepts.html#xray-concepts-annotations
From the above link:
Metadata are key-value pairs with values of any type, including objects and lists
It's worth noting that AWS Metadata are not indexed/searchable:
Metadata are key-value pairs with values of any type, including objects and lists, but that are not indexed. Use metadata to record data you want to store in the trace but don't need to use for searching traces.
If you don't want to search by specific metadata values and just treat the whole object as one, indivisible entity then you might as well just serialize them to JSON and put them all into a string attribute; there's no real need to have a nested structure of attributes for them.
I can see a few exporters (in addition to X-ray) in the Collector which treat the nested like maps instead of flattening or converting to JSON or strings, so supposedly their corresponding backends support nested structures. I think this issue requires more feedback from vendors before we decide what to do about it.
So the question here is: is it really worthwhile to introduce map-valued/nested attributes? Do we have any other use case than HTTP attributes? Because if it's only that one use case the cost of implementing that seems to be pretty high (new attribute key type support + exporter changes in all language SIGs, collector included; backend support).
Is there an alternative suggested to represent map data in attributes? One of the instrumentation libraries in JS sdk for browser sends map data as Span Events which is very inefficient.
{
"traceId": "bb2a647c845bb6d68dc415042f0ee49d",
"name": "documentLoad",
"id": "69a3580422564f5c",
"kind": 0,
"timestamp": 1654107849066400,
"duration": 121600,
"attributes": {
"component": "document-load",
"http.url": "http://localhost:5002/",
"http.user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.61 Safari/537.36",
},
"status": {
"code": 0
},
"events": [
{
}
{
"name": "fetchStart",
"attributes": {},
"time": [
1654107849,
66400100.00000001
]
},
{
"name": "unloadEventStart",
"attributes": {},
"time": [
1654107849,
94600100
]
},
{
"name": "unloadEventEnd",
"attributes": {},
"time": [
1654107849,
97000100
]
},
{
"name": "domInteractive",
"attributes": {},
"time": [
1654107849,
104000100
]
},
{
"name": "domContentLoadedEventStart",
"attributes": {},
"time": [
1654107849,
187800100
]
},
{
"name": "domContentLoadedEventEnd",
"attributes": {},
"time": [
1654107849,
187900100
]
},
{
"name": "domComplete",
"attributes": {},
"time": [
1654107849,
188000100
]
},
{
"name": "loadEventStart",
"attributes": {},
"time": [
1654107849,
188000100
]
},
{
"name": "loadEventEnd",
"attributes": {},
"time": [
1654107849,
188000100
]
},
{
"name": "firstPaint",
"attributes": {},
"time": [
1654107849,
106000100
]
},
{
"name": "firstContentfulPaint",
"attributes": {},
"time": [
1654107849,
106000100
]
}
]
}