opentelemetry-python Incorrect validation of baggage values

Describe your environment python: 3.10.6 opentelemetry-api: 1.12.0 opentelemetry-sdk: 1.12.0

Steps to reproduce

from opentelemetry.baggage import _is_valid_pair
_is_valid_pair("sentry-transaction", "GET%20%2Fapi%2F%2Freport") # OK

from opentelemetry.baggage.propagation import W3CBaggagePropagator
W3CBaggagePropagator().extract({"baggage": "sentry-transaction=GET%20%2Fapi%2Freport"})
# Returns empty and raises warning Invalid baggage entry: `sentry-transaction=GET%20%2Fapi%2Freport`

What is the expected behavior? Opentelemetry should be able to parse baggage values which are percent encoded, as that's indicated by the W3C spec

A value contains a string whose character encoding MUST be UTF-8 [Encoding]. Any characters outside of the baggage-octet range of characters MUST be percent-encoded. Characters which are not required to be percent-encoded MAY be percent-encoded.

What is the actual behavior? No baggage values were extracted and the following warning was raised:

Invalid baggage entry: `sentry-transaction=GET%20%2Fapi%2Freport`

Additional context The problem is happening here: https://github.com/open-telemetry/opentelemetry-python/blob/75313b6c6d58945c1401622b6683ccdd28657984/opentelemetry-api/src/opentelemetry/baggage/propagation/init.py#L92

Note also that the indicated code path seems to be not be too optimized, as the call to set_baggage will validate both the key and values again. And additionally, for each baggage value a new context will be created. Maybe it'd be better to first collect all baggage values and set them only once in the context.

Sep 15 '22 20:09 sk-

@srikanthccv @ocelotl what do you think?

Sep 21 '22 00:09 sk-

This looks like a legitimate bug. I haven't gotten time to triage this. I will check.

Sep 21 '22 06:09 srikanthccv

Checking...

Sep 26 '22 12:09 ocelotl

Ok, I think this is what's happening here:

The definition for the baggage value says:

value                  =  *baggage-octet
baggage-octet          =  %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                          ; US-ASCII characters excluding CTLs,
                          ; whitespace, DQUOTE, comma, semicolon,
                          ; and backslash

That is why our regex value is this:

_VALUE_FORMAT = r"[\x21\x23-\x2b\x2d-\x3a\x3c-\x5b\x5d-\x7e]*"

That regex does not allow space characters, the reported string has one in %20. The spec for value says:

Leading and trailing whitespaces (OWS) are allowed and are not considered to be a part of the value.

I think that means we should first strip any leading and trailing whitespace characters from the value string, then apply the regex (we don't do that so I think we do have a bug).

The space character in "GET%20%2Fapi%2F%2Freport" is not leading nor trailing so I think we are doing the right thing by not accepting this string.

What do you think, @srikanthccv?

Sep 26 '22 12:09 ocelotl

@ocelotl in the same spec you are citing you are missing the important part

Any characters outside of the baggage-octet range of characters MUST be percent-encoded.

Sep 26 '22 13:09 sk-

We are running into a similar issue. FWIW, The OTEL client for NodeJS handles values with spaces so long as they are percent-encoded.

Nov 01 '22 19:11 bruce-y

I'm not sure extract should accept percent-encoded strings that have not been percent-encoded themselves, from here:

Because the percent ("%") character serves as the indicator for percent-encoded octets, it must be percent-encoded as "%25" for that octet to be used as data within a URI.

I think extract is doing the right thing by rejecting a string that comes with a %20 character because it expects that string to come from the result of an inject method who should have percent-encoded that %20 (leading to %2520). What is happening here is that a percent-encoded string is being passed to extract without first passing it to inject, which leads to an invalid string which does not have its percent-encoded characters percent-encoded themselves. When the same string is passed first to inject the string is handled right:

from opentelemetry.baggage.propagation import W3CBaggagePropagator
from opentelemetry.context import get_current
from opentelemetry.baggage import set_baggage

carrier = {}
w3c_baggage_propagator = W3CBaggagePropagator()

context = set_baggage(
    "baggage",
    "sentry-transaction=GET%20%2Fapi%2Freport",
    context=get_current()
)

w3c_baggage_propagator.inject(carrier, context)

print(carrier)

context = w3c_baggage_propagator.extract(carrier)

print(context)

{'baggage': 'baggage=sentry-transaction%3DGET%2520%252Fapi%252Freport'}
{'baggage-b037abe3-81b2-4195-86e6-929af2d86e99': {'baggage': 'sentry-transaction=GET%20%2Fapi%2Freport'}}

It is impossible for extract to handle percent-encoded strings that have not been percent-encoded themselves before, because how can extract tell if a percent symbol comes from the original string (then expect should leave it as it is) or from the encoding done by inject (then expect should decode it)?

I don't think this is a bug, WDYT @srikanthccv?

Nov 17 '22 15:11 ocelotl

@ocelotl I see your point, you are saying that the spec is ambiguos because given that the baggage-octet range includes the '%' symbol and the language around when to encode string, then there's no way to distinguish whether a%20b means a%20b, which would be fine because all characters are within the range, or whether it means a b, which would also be fine as the space is outside the range, and hence must be percent encoded.

Note however, that in Java, the value is always percent encoded, so the case above would not be ambiguous, as in the first case it would be encoded as a%2520b and in the second as a%20b. See https://cs.github.com/open-telemetry/opentelemetry-java/blob/0e41b1469d1fbc20c4fc5e7b59df0f4ac54330b6/api/all/src/main/java/io/opentelemetry/api/baggage/propagation/W3CBaggagePropagator.java#L68

In Java, it also seems there's no validation happening for the value, and the value is always being decoded. See https://cs.github.com/open-telemetry/opentelemetry-java/blob/0e41b1469d1fbc20c4fc5e7b59df0f4ac54330b6/api/all/src/main/java/io/opentelemetry/api/baggage/propagation/Parser.java#L128 and the function that calls it.

Note though, that your example should be:

from opentelemetry.baggage.propagation import W3CBaggagePropagator
from opentelemetry.context import get_current
from opentelemetry.baggage import set_baggage

carrier = {}
w3c_baggage_propagator = W3CBaggagePropagator()

context = set_baggage(
    "sentry-transaction",
    "GET /api/report",
    context=get_current()
)

print(context)

w3c_baggage_propagator.inject(carrier, context)

print(carrier)

context = w3c_baggage_propagator.extract(carrier)

print(context)

which unfortunately does not work, because you are restricting the values that can be used in the baggage, which is no correct.

Nov 17 '22 19:11 sk-

@ocelotl I see your point, you are saying that the spec is ambiguos because given that the baggage-octet range includes the '%' symbol and the language around when to encode string, then there's no way to distinguish whether a%20b means a%20b, which would be fine because all characters are within the range, or whether it means a b, which would also be fine as the space is outside the range, and hence must be percent encoded.

What I am saying is that for a baggage value to be parseable by extract it has to be percent-encoded first. In your example above you mentioned this:

from opentelemetry.baggage.propagation import W3CBaggagePropagator
W3CBaggagePropagator().extract({"baggage": "sentry-transaction=GET%20%2Fapi%2Freport"})
# Returns empty and raises warning Invalid baggage entry: `sentry-transaction=GET%20%2Fapi%2Freport`

That string "sentry-transaction=GET%20%2Fapi%2Freport" is not percent-encoded and that is why you are getting that error. Out of curiosity, why would you pass a non-percent encoded string to extract?

which unfortunately does not work, because you are restricting the values that can be used in the baggage, which is no correct.

Sorry, how are we restricting the values? :thinking:

Nov 17 '22 19:11 ocelotl

@ocelotl that value is percent encoded correctly, and corresponds to the value GET /api/report. That is a baggage value generated by Sentry, and it is encoded correctly according to the spec.

The problem with your example is that you are misunderstanding how percent encoding works. Please refer to the code used in the java package.

As I mentioned originally, the problem is that the validation of the value is happening after decoding it. But, you should first validate the encoded value conforms to the character range and then decode its value.

You are restricting the values, because the spec, which I have cited many times already, says

A value contains a string whose character encoding MUST be UTF-8 [Encoding]. Any characters outside of the baggage-octet range of characters MUST be percent-encoded. Characters which are not required to be percent-encoded MAY be percent-encoded.

So you are allowed to have any string as a value, but the function set_baggage rejects any string with characters outside the baggage-octet range.

Pleas also note that there are at least 2 other open-telemetry implementations that handle this.

@srikanthccv could you please take a look

Nov 18 '22 00:11 sk-

Ah, ok, now I can understand what you mean, fixing...

Nov 18 '22 08:11 ocelotl

@ocelotl Would you like to assign this yourself?

Nov 18 '22 19:11 srikanthccv

Related, decoding resource attributes from otlpresourcedetector: https://github.com/open-telemetry/opentelemetry-python/pull/3046

Nov 21 '22 23:11 lzchen

@ocelotl Would you like to assign this yourself?

I did already.

Nov 28 '22 20:11 ocelotl

What is the status of this? Major issue for production release if we cant contain spaces

Dec 20 '22 03:12 ghandic

@ghandic

There is a PR open pending reviews for this issue. Once that is merged, it should go into the next release.

Jan 09 '23 21:01 lzchen

opentelemetry-python opentelemetry-python copied to clipboard

Incorrect validation of baggage values

opentelemetry-python
opentelemetry-python copied to clipboard