opentelemetry-collector-contrib icon indicating copy to clipboard operation
opentelemetry-collector-contrib copied to clipboard

Inject OTel tracestate when power-of-two probability sampling

Open jmacd opened this issue 2 years ago • 5 comments

Proposal

I propose to modify the probabilistic sampling processor to emit an OTel tracestate corresponding with the sampling probability in use, which is well defined in the current specification when the sampling probability is a power of two.

Specifically, if the sampling probability is a power of two such that Log2(probability) == -X where X is an integer, then the corresponding Span's tracestate should be extended with an entry ot=p:X. Here, "p" is the base-2 logarithm of "adjusted count".

When the Span already has a p-value, probabilities multiply. If the span does not have a p-value, which formally means "unknown" adjusted count, we will assume the count is 1 span corresponding with probability=1 (i.e., ot=p:0).

If the span already has an ot=p:Y property, the correct output is ot=p:Z for Z=X+Y (multiplying probabilities == addition inside Log2).

For example, if performing 50% sampling then we are multiplying adjusted counts by 2. Spans with no adjusted count on arrival will depart with p-value 1, and Spans with an adjusted count on arrival will depart with p-value greater by 1.

Other ot= properties than p, such as r, SHOULD pass through unmodified.

Additional context

The tracestate fields used here are defined in https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md

This specification defines support only for power-of-two adjusted counts. Support for non-power-of-two adjusted counts would require more specification work, but is certainly possible. The proposal here is to leave the processor logic here unmodified, and only to extend p-value when the probability is a power of two. Some documentation will be required to warn users that arbitrary composition of these processors having mixed power-of-two and non-power-of-two probabilities will not behave correctly. Users who want this should user powers-of-two everywhere.

The type of sampling being performed here is considered "after-the-fact", since we are interpreting and mutating the Span data of finished spans. Trace context uses an r-value to describe a randomness value for contexts in flight to use making consistent decisions, whereas this processor needs only to interpret and set p-value.

jmacd avatar Feb 17 '22 20:02 jmacd

I think it would be better not to mix the information of consistent and inconsistent sampling decisions. If the sampling decision is not based on the r-value, it should not touch the p-value. Maybe, it is better to add some extra field (e.g. t for trace) for the sampling probability (or corresponding adjusted count) that was applied to the entire trace independently of r.

oertl avatar Feb 18 '22 06:02 oertl

Continuing with the idea of an extra field, why don't we reconsider already discussed explicit adjusted count c, but now taking precedence over p. The effective adjusted count would be

  • the value of c, if c is present
  • 2**p, if c is not present

Thus, a probabilistic sampling processor could generate the c-values from p-values and its own sampling probability. Furthermore, if we allow non-integer values for c, the processor does not have to be constrained to probabilities being powers of 2.

PeterF778 avatar Feb 22 '22 22:02 PeterF778

When there are multiple sampling stages, such as when consistent probability samplers are combined with this probabilistic sampling processor, each making sampling decisions independently (based on different random/hash values), it is generally important to know which sampling probability (or corresponding adjusted count) was used at which stage in order to have the chance to get the statistics right. If this sampling processor uses the same randomness as consistent probability samplers (the r-value), it would be correct to use/adapt the p field to propagate the adjusted count.

oertl avatar Feb 22 '22 23:02 oertl

Could you please expand on the use-cases for this? Is this so that users can analyze this data later and extrapolate based on the probability? And just to confirm: we are talking only about root spans here, right?

Would you also be open to providing concrete examples of input and output?

jpkrohling avatar Mar 02 '22 11:03 jpkrohling

Probabilistic sampling processor would typically live in a collector. Its role would be to further reduce the volume of data (spans), if for whatever reason it decides that it is too high. On input, it receives spans which have been already sampled by a tracer, more specifically, here we assume that consistent probability samplers were used (see https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md) and that the spans have their r-values and p-values. If sampling processor is not present (i.e. all spans are passed through), the p-values are sufficient to evaluate adjusted counts. However, if there's additional sampling processor(s) involved, the adjusted counts would be incorrect, if they were based on the original p-values. This applies to all spans, not just root spans.

PeterF778 avatar Mar 02 '22 23:03 PeterF778