opentelemetry-go icon indicating copy to clipboard operation
opentelemetry-go copied to clipboard

baggage: Accept non-ASCII keys

Open pellared opened this issue 1 year ago • 8 comments

Per https://github.com/open-telemetry/opentelemetry-specification/pull/3801

The new wording makes Go implementation non-compliant, but it's not a breaking change for its API, you will just need to relax the checks (i.e. never return errors from baggage.New)

Originally posted by @yurishkuro in https://github.com/open-telemetry/opentelemetry-specification/issues/3801#issuecomment-1939611736

pellared avatar Feb 19 '24 12:02 pellared

I need help here.

The blank space should be considered as a valid UTF-8 string, but it also means would be a valid key.

p, err := NewKeyProperty(" ")
assert.NoError(t, err)

The following case also became available that ąść is a valid key that will be encoded as %C4%85%C5%9B%C4%87.

		{
			name: "encoded UTF-8 string in key",
			in:   "%C4%85%C5%9B%C4%87=%C4%85%C5%9B%C4%87",
			want: baggage.List{
				"ąść": {Value: "ąść"},
			},
		},

Do you think these cases should be considered valid cases? @pellared

XSAM avatar Mar 25 '24 02:03 XSAM

p, err := NewKeyProperty(" ")
assert.NoError(t, err)

This is a valid correct test as it touches the Baggage API (not propagator).

The following case also became available that ąść is a valid key that will be encoded as %C4%85%C5%9B%C4%87.

Right now, the specification for W3C baggage propagator does NOT support percent encoding of the key. From https://www.w3.org/TR/baggage/#key:

3.2.1.2 key A token which identifies a value in the baggage. token is defined in RFC7230, Section 3.2.6.

func (b Baggage) String() string should skip encoding of the list members and properties which contain invalid keys (according to https://www.w3.org/TR/baggage/#key).

Alternatively, the W3C baggage specification could allow percent encoding of the keys. CC @yurishkuro @dyladan

pellared avatar Mar 25 '24 12:03 pellared

Percent encoding of keys is something that has been considered and rejected by W3C. Worth noting that there is nothing stopping an implementation from doing it; all the required characters are allowed. If the w3c specification added it though, it creates a lot of questions. For example, if you have an entry for key1 and an entry for key%31 is this a duplicate entry? It forces additional processing requirements on receivers that we prefer to avoid.

dyladan avatar Mar 25 '24 15:03 dyladan

Worth noting that there is nothing stopping an implementation from doing it; all the required characters are allowed.

I think it is better (safer) not doing as the implementations (e.g. in other languages) may not percent-decode the keys. This may lead to unpredictable behavior. @dyladan, does it make sense?

pellared avatar Mar 25 '24 15:03 pellared

I didn't mean go should do it. I meant the otel spec could do it if desired then all languages would do it consistently

dyladan avatar Mar 25 '24 15:03 dyladan

From baggage spec:

Baggage names are any valid UTF-8 strings. Language API SHOULD NOT restrict which strings are used as baggage names. However, the specific Propagators that are used to transmit baggage entries across component boundaries may impose their own restrictions on baggage names. For example, the W3C Baggage specification restricts the baggage keys to strings that satisfy the token definition from RFC7230, Section 3.2.6. For maximum compatibility, alpha-numeric names are strongly recommended to be used as baggage names.

It mentions the baggage could accept any valid UTF-8 as a key but not for the W3C baggage propagator. So, I guess we can implement UTF-8 in baggage API but drop the key with UTF-8 in the W3C baggage propagator.

XSAM avatar Mar 25 '24 17:03 XSAM

The spec allows any UTF8 strings as baggage keys, case sensitive. It already recommends these unit tests for Baggage API:

baggage.Set('a', 'B% 💼');
baggage.Set('A', 'c');
baggage.Get('a'); // returns "B% 💼"
baggage.Get('A'); // returns "c"

I would add two more tests where the key itself contains an emoji or the key is a space - both are valid.

For the W3C propagator, there is already a written compatibility test: https://github.com/w3c/baggage/blob/main/test/test_baggage.py. One thing that's seems missing there is the handling of malformed inputs (I booked https://github.com/w3c/baggage/issues/131)

yurishkuro avatar Mar 25 '24 23:03 yurishkuro

I didn't mean go should do it. I meant the otel spec could do it if desired then all languages would do it consistently

I do not think it should be done especially that % is an valid value in the token. See: https://datatracker.ietf.org/doc/html/rfc7230#section-3.2.6

pellared avatar Mar 27 '24 15:03 pellared