Minor spec change around UNAVAILABLE / RetryInfo
It doesn't make sense that we recommend a RetryInfo with time 0, none of the implementations i've seen special case this value, and it goes against the rest of the spec which says to set it to a real integer.
It also doesn't make since to say the server "MUST" use UNAVAILABLE for retryable errors, and then go on to mention other retryable error status codes..
rest of the spec which says to set it to a real integer
Where does it say that?
https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlpgrpc-throttling - this is the section that talks about how to use RetryInfo.
To signal backpressure when using gRPC transport, the server MUST return an error with code Unavailable and MAY supply additional details via status using RetryInfo. Here is a snippet of sample Go code to illustrate:
The only other mention of RetryInfo in the spec is:
The client SHOULD interpret RESOURCE_EXHAUSTED code as retryable only if the server signals that the recovery from resource exhaustion is possible. This is signaled by the server by returning a status containing RetryInfo. In this case the behavior of the server and the client is exactly as described in OTLP/gRPC Throttling section. If no such status is returned, then the RESOURCE_EXHAUSTED code SHOULD be treated as non-retryable.
This part of the spec I'm removing doesn't make sense.. Why must retryable errors return a RetryInfo of 0 and UNAVAILABLE ?
i think this can get merged now
@tigrannajaryan, this "MUST requirement" does not make sense and is conflicting with other parts of the specification.
This
The server MUST indicate retryable errors using code Unavailable
conficts with
The server MAY use other gRPC codes to indicate retryable [...] errors.
This
MAY supply additional details via status using RetryInfo containing 0 value of RetryDelay
seems to have only sense for RESOURCE_EXHAUSTED but it is is already described in a dedicated sentence
https://github.com/open-telemetry/opentelemetry-proto/blob/68f9c6329ca91f96333ee5b240e305945b122b70/docs/specification.md#L294-L301
it is very strange to call out 0 especially given there is also an example like this:
https://github.com/open-telemetry/opentelemetry-proto/blob/68f9c6329ca91f96333ee5b240e305945b122b70/docs/specification.md#L311-L341
@DylanRussell, shouldn't we also change
https://github.com/open-telemetry/opentelemetry-proto/blob/68f9c6329ca91f96333ee5b240e305945b122b70/docs/specification.md?plain=1#L311
to
To signal backpressure when using gRPC transport, the server SHOULD return an
Yeah those suggestions SG. I've updated the PR
This fixes a bug in the spec and should be allowed.
@open-telemetry/technical-committee, can someone merge it? :wink:
Can't merge, the checks are stuck.
@open-telemetry/technical-committee can we have a few more eyes on this? I want to make sure we didn't miss anything.