opentelemetry-specification icon indicating copy to clipboard operation
opentelemetry-specification copied to clipboard

Support map values and nested values for attributes

Open tigrannajaryan opened this issue 4 years ago • 60 comments

After https://github.com/open-telemetry/opentelemetry-specification/pull/368 gets merged we will have support for array values.

If we add support for maps and nesting it will allow to represent arbitrary nested data structures in attribute values if needed.

This will apply to span and resource attributes.

tigrannajaryan avatar Dec 06 '19 16:12 tigrannajaryan

Can we explicitly state that this applies to resources too? I believe that span attributes and resources (which are only specified in the proto, currently) are specified with the same structure.

jmacd avatar Dec 06 '19 17:12 jmacd

Are there any use cases for arbitrary nesting? I think (multi)maps would be useful to store, e.g., HTTP headers, but what would be the rationale for arbitrary nesting?

Oberon00 avatar Dec 12 '19 12:12 Oberon00

Arbitrary nesting map can represent the classified values e.g. {"http" : {"url":...,"method":...}} or {"sql" : {"query":...,"engine":...}}. It can also host vendor specific data like {"aws": {"account_id":...}} in the situations when Resource isn't the right place, e.g. client side metrics

shengxil avatar May 05 '20 05:05 shengxil

In #579, Tigran's example seems to contain a use-case. The resource of "application B" is a set of key-value attributes.

jmacd avatar May 05 '20 05:05 jmacd

Semantically, I agree that the dotted string notation is equivalent to a map, although I'd like to point out, that at least from the tracing client perspective, dotted strings are a slightly more efficient representation.

Consider the following representations for the key-value pair 'http.method': 'GET'.

Dotted string representation { 'http.method': 'GET } To represent this we need 1 map and 2 strings; 3 total objects.

Map representation { { 'http' : { 'method': 'get' } } This requires 2 maps, 3 strings; 5 total objects.

Furthermore, most tracing backends do not support nested attributes (as far as I know), and will need to flatten them into dotted strings. This is something that will either have to been done in the tracing clients during export, or by the backends on ingest.

While I recognize that this does have some advantages in regards to semantics and for the data that can be represented, it does introduce complexity into tracing clients and backends. I'm not saying we shouldn't pursue this proposal, but we should discuss what the actual benefits are, and whether the added complexity is worth the tradeoff.

mwear avatar Jun 12 '20 21:06 mwear

I'd also like to add that with the dotted-string notation, tracing clients can reduce the runtime string allocations to 0 for attribute keys by introducing constants for semantic conventions (and any other commonly used keys). We would lose this ability by changing to nested maps.

I should also clarify that I am completely ok with array support. It's the nested map support that I have reservations about.

mwear avatar Jun 15 '20 16:06 mwear

@bogdandrutu can you please clarify why is this reopened?

tigrannajaryan avatar Jun 23 '20 14:06 tigrannajaryan

@tigrannajaryan because of the last week discussion and concerns raised by @mwear

bogdandrutu avatar Jun 23 '20 15:06 bogdandrutu

Was closing it even intentional? I can't remember any final decision in this issue. At least it's not documented here?

Oberon00 avatar Jun 24 '20 11:06 Oberon00

I guess closing this via a commit into some personal repository (tigrannajaryan/exp-otelproto@1507a2f) was accidental? It's surprising to me that GitHub allows this (I would even be tempted to call that a bug).

Oberon00 avatar Jul 10 '20 13:07 Oberon00

I guess closing this via a commit into some personal repository (tigrannajaryan/exp-otelproto@1507a2f) was accidental? It's surprising to me that GitHub allows this (I would even be tempted to call that a bug).

Definitely not intentional. I don't understand why does a commit in my personal repo which is not even a fork of this one result in closing an issue here. That's weird. Agree it appears to be a bug.

tigrannajaryan avatar Jul 10 '20 17:07 tigrannajaryan

To clarify: are nested arrays allowed, e.g. [][]int, or only single dimensional arrays e.g. []int

lizthegrey avatar Jul 20 '20 14:07 lizthegrey

Currently only homogeneous, single-dimensional arrays of primitives are allowed. This issue is about relaxing that. EDIT: Discussion is happening mostly on https://github.com/open-telemetry/opentelemetry-specification/pull/596 recently.

Oberon00 avatar Jul 20 '20 14:07 Oberon00

Currently only homogeneous, single-dimensional arrays of primitives are allowed. This issue is about relaxing that.

Great. So my linked PR at the moment meets the spec, but won't once we relax the criteria.

lizthegrey avatar Jul 20 '20 14:07 lizthegrey

I maintain we should not relax the criteria, but see #596. This has also been discussed in a few SIG Spec meetings.

Oberon00 avatar Jul 20 '20 14:07 Oberon00

Corresponding PR didn't recieve a single approve. https://github.com/open-telemetry/opentelemetry-specification/pull/596#issuecomment-673883652 Meaning nobody is seeking for this feature. Moving this issue to Future milestone.

SergeyKanzhelev avatar Aug 14 '20 04:08 SergeyKanzhelev

For the record for future discussions: map values and nested values are currently supported at the OTLP protocol level, but are not utilized by the OpenTelemetry API. This issue is to discuss whether we also want to support such values in the API.

tigrannajaryan avatar Aug 14 '20 19:08 tigrannajaryan

Given map values and nested values are supported at the OTLP protocol level, should we treat the current spec of MUST for only primitives and homogeneous lists to be the base requirement and potentially add a MAY for maps and nested values? The expanded types have been working well at a trace level, so strictly implementing the spec would require us to add validations and restrict inputs, which I'm hesitant to add when it's working and we remove the need for added operations.

https://github.com/open-telemetry/opentelemetry-erlang/issues/111

bryannaegele avatar Oct 01 '20 05:10 bryannaegele

This issue is missing specific use cases / requirements. Here's one: capturing all http headers of a request.

yurishkuro avatar Aug 30 '21 16:08 yurishkuro

I have a couple of questions/observations regarding the HTTP headers use case:

  1. If we allow adding nested structures/maps to Attributes, what type should the attribute key represent? Also Attributes? E.g. in Java AttributeKey<Attributes> attributesKey(String key) would be the factory method
  2. How do we serialize nested structures in non-OTLP formats? Currently in Jaeger & Zipkin exporters we're serializing arrays as JSON lists (["a","b","c"]), by analogy should we serialize nested structures as JSON objects? Or should flatten them like describes in this comment https://github.com/open-telemetry/opentelemetry-specification/issues/376#issuecomment-643488960 ?
  3. Now that Attributes are also used in the metrics API, what would happen if you passed a nested Attributes instance to a meter? Since they're not allowed for metrics

mateuszrzeszutek avatar Sep 10 '21 11:09 mateuszrzeszutek

If we allow adding nested structures/maps to Attributes, what type should the attribute key represent? Also Attributes? E.g. in Java AttributeKey<Attributes> attributesKey(String key) would be the factory method

That is a very Java specific problem and the Java SIG should reach consensus.

How do we serialize nested structures in non-OTLP formats? Currently in Jaeger & Zipkin exporters we're serializing arrays as JSON lists (["a","b","c"]), by analogy should we serialize nested structures as JSON objects? Or should flatten them like describes in this comment Support map values and nested values for attributes #376 (comment) ?

JSON is the simplest way. I think we should not enter the business of flattening. Also each protocol can define their ways of doing this, we care mostly about Jaeger and Zipkin.

Now that Attributes are also used in the metrics API, what would happen if you passed a nested Attributes instance to a meter? Since they're not allowed for metrics

They will continue to not be allowed for the moment

bogdandrutu avatar Sep 15 '21 09:09 bogdandrutu

If we allow adding nested structures/maps to Attributes, what type should the attribute key represent? Also Attributes? E.g. in Java AttributeKey attributesKey(String key) would be the factory method

That is a very Java specific problem and the Java SIG should reach consensus.

I don't think this is Java-specific at all. This is a problem for all SIGs that want to strongly type their attributes and also for the semantic convention generator. What should the type of the attribute be declared as? A map<string, string>? That's our first complex type (arrays are currently coded by explicitly listing all possible array types string[], double[] etc.). Do we need arbitrary nesting or just maps from atomic primitive to atomic primitive or array thereof?

I think we need to understand that while allowing maps seems conceptually very simple, and also a no-op in OTLP, I think it opens a can of worms regarding tooling.

Also, I know of no backend that supports maps for tracing. @mwear also brought up performance concerns above in https://github.com/open-telemetry/opentelemetry-specification/issues/376#issuecomment-643488960 (and received many upvotes on that comment).

Oberon00 avatar Sep 15 '21 09:09 Oberon00

I don't think this is Java-specific at all. This is a problem for all SIGs that want to strongly type their attributes and also for the semantic convention generator. What should the type of the attribute be declared as? A map<string, string>? That's our first complex type (arrays are currently coded by explicitly listing all possible array types string[], double[] etc.). Do we need arbitrary nesting or just maps from atomic primitive to atomic primitive or array thereof?

Exactly - for the HTTP headers use case the desired type would be map<string, string[]>, but what about other uses? Should we have to add map<string, string> too? Or map<string, ?>, map<int, string> etc. Or just say that we support nesting Attributes (which is sort of a map) inside Attributes.

I think we need to understand that while allowing maps seems conceptually very simple, and also a no-op in OTLP, I think it opens a can of worms regarding tooling.

Also, I know of no backend that supports maps for tracing. @mwear also brought up performance concerns above in #376 (comment) (and received many upvotes on that comment).

So the question here is: is it really worthwhile to introduce map-valued/nested attributes? Do we have any other use case than HTTP attributes? Because if it's only that one use case the cost of implementing that seems to be pretty high (new attribute key type support + exporter changes in all language SIGs, collector included; backend support).

mateuszrzeszutek avatar Sep 15 '21 14:09 mateuszrzeszutek

FYI, OTLP and Collector define the attributes field as map<string,AnyValue>, where AnyValue is one of the supported value types, including a map, so it is a recursive definition. (Technically the map is a represented as key/value list in OTLP wire protocol, but the semantics are of a map).

Because if it's only that one use case the cost of implementing that seems to be pretty high (new attribute key type support + exporter changes in all language SIGs, collector included; backend support).

No changes are needed in the Collector, it is already fully supported.

tigrannajaryan avatar Sep 15 '21 15:09 tigrannajaryan

No changes are needed in the Collector, it is already fully supported.

How does the collector handle exporting a nested map to Jaeger / Zipkin? Does it already JSON-encode?

Oberon00 avatar Sep 15 '21 15:09 Oberon00

Also, I know of no backend that supports maps for tracing.

From what I understand AS X-Ray does: https://docs.aws.amazon.com/xray/latest/devguide/xray-concepts.html#xray-concepts-annotations

From the above link:

Metadata are key-value pairs with values of any type, including objects and lists

tigrannajaryan avatar Sep 15 '21 15:09 tigrannajaryan

No changes are needed in the Collector, it is already fully supported.

How does the collector handle exporting a nested map to Jaeger / Zipkin? Does it already JSON-encode?

Yes. Jaeger translation calls AsString() which converts maps and arrays to JSON string.

I didn't check Zipkin, it should do the same.

tigrannajaryan avatar Sep 15 '21 15:09 tigrannajaryan

Also, I know of no backend that supports maps for tracing.

From what I understand AS X-Ray does: https://docs.aws.amazon.com/xray/latest/devguide/xray-concepts.html#xray-concepts-annotations

From the above link:

Metadata are key-value pairs with values of any type, including objects and lists

It's worth noting that AWS Metadata are not indexed/searchable:

Metadata are key-value pairs with values of any type, including objects and lists, but that are not indexed. Use metadata to record data you want to store in the trace but don't need to use for searching traces.

If you don't want to search by specific metadata values and just treat the whole object as one, indivisible entity then you might as well just serialize them to JSON and put them all into a string attribute; there's no real need to have a nested structure of attributes for them.

mateuszrzeszutek avatar Sep 17 '21 12:09 mateuszrzeszutek

I can see a few exporters (in addition to X-ray) in the Collector which treat the nested like maps instead of flattening or converting to JSON or strings, so supposedly their corresponding backends support nested structures. I think this issue requires more feedback from vendors before we decide what to do about it.

tigrannajaryan avatar Sep 17 '21 16:09 tigrannajaryan

So the question here is: is it really worthwhile to introduce map-valued/nested attributes? Do we have any other use case than HTTP attributes? Because if it's only that one use case the cost of implementing that seems to be pretty high (new attribute key type support + exporter changes in all language SIGs, collector included; backend support).

Is there an alternative suggested to represent map data in attributes? One of the instrumentation libraries in JS sdk for browser sends map data as Span Events which is very inefficient.

{
    "traceId": "bb2a647c845bb6d68dc415042f0ee49d",
    "name": "documentLoad",
    "id": "69a3580422564f5c",
    "kind": 0,
    "timestamp": 1654107849066400,
    "duration": 121600,
    "attributes": {
        "component": "document-load",
        "http.url": "http://localhost:5002/",
        "http.user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.61 Safari/537.36",
 

    },
    "status": {
        "code": 0
    },
    "events": [

      {

}

        {
            "name": "fetchStart",
            "attributes": {},
            "time": [
                1654107849,
                66400100.00000001
            ]
        },
        {
            "name": "unloadEventStart",
            "attributes": {},
            "time": [
                1654107849,
                94600100
            ]
        },
        {
            "name": "unloadEventEnd",
            "attributes": {},
            "time": [
                1654107849,
                97000100
            ]
        },
        {
            "name": "domInteractive",
            "attributes": {},
            "time": [
                1654107849,
                104000100
            ]
        },
        {
            "name": "domContentLoadedEventStart",
            "attributes": {},
            "time": [
                1654107849,
                187800100
            ]
        },
        {
            "name": "domContentLoadedEventEnd",
            "attributes": {},
            "time": [
                1654107849,
                187900100
            ]
        },
        {
            "name": "domComplete",
            "attributes": {},
            "time": [
                1654107849,
                188000100
            ]
        },
        {
            "name": "loadEventStart",
            "attributes": {},
            "time": [
                1654107849,
                188000100
            ]
        },
        {
            "name": "loadEventEnd",
            "attributes": {},
            "time": [
                1654107849,
                188000100
            ]
        },
        {
            "name": "firstPaint",
            "attributes": {},
            "time": [
                1654107849,
                106000100
            ]
        },
        {
            "name": "firstContentfulPaint",
            "attributes": {},
            "time": [
                1654107849,
                106000100
            ]
        }
    ]
}

scheler avatar Jul 20 '22 23:07 scheler