opentelemetry-specification Review approach & specify algorithm for TraceIdRatioBasedSampler (ProbabilitySampler)

Review approach & specify algorithm for TraceIdRatioBasedSampler (ProbabilitySampler)

Open Oberon00 opened this issue 3 years ago • 25 comments

See also discussion on https://github.com/open-telemetry/opentelemetry-specification/pull/1412#issuecomment-776101910

The sampling alorithm for TraceIdRatioBasedSampler is unspecified. As a result, trace IDs that are sampled by some implementations might get non-sampled or re-sampled by SDKs in other languages, even though they have the same or a a higher probability than the parent.

TODO list for this issue:

Is this a problem at all? Or is the ParentBased approach enough (in combination with any, not necessarily trace-id-based probability-based sampling of root spans)?
~~If it is a problem, since trace IDs can come from untrusted, non-random sources, do we open up a DDoS/Security/performance issue when using trace IDs as sole, deterministic input for our sampling algorithm? Do we need to put a warning there?~~ Let's assume it is no problem for this issue, this should be handled in #1188.
If applicable after checking the above we determine that we need a consistent algorithm, actually do specify it (maybe based on #331).

Feb 09 '21 15:02 Oberon00

If it is a problem, since trace IDs can come from untrusted, non-random sources, do we open up a DDoS/Security/performance issue when using trace IDs as sole, deterministic input for our sampling algorithm? Do we need to put a warning there?

If we do re-use the traceID from untrusted sources we have bigger problems than the sampler. I think that topic should be addressed independently and separately. For example you may overload the backend, you may overload the collector if you are doing a group by trace id, etc. I think there are a lot of failing scenarios when the trace id is untrusted that it deserve its own section and should be resolved at the server span level. One solution that I proposed was to simply use the incoming span context as a link instead of a parent. we can discuss more about that :)

Feb 09 '21 17:02 bogdandrutu

If we do re-use the traceID from untrusted sources we have bigger problems than the sampler. I think that topic should be addressed independently and separately.

I think you are right. I will remove this item from the enumeration above. Is #1188 the right issue for that discussion?

Feb 09 '21 17:02 Oberon00

Is this a problem at all? Or is the ParentBased approach enough (in combination with any, not necessarily trace-id-based probability-based sampling of root spans)?

I've seen implementation that use "inflationary sampling probability", and the algorithm is important for them. For example if Service A (0.2% sampling) -> Service B (0.4% sampling) and you want to end up with about 0.2 full traces and another 0.4 (at Service B level, including the Service A traces).

The algorithm that I've seen (I think @jmacd was one of the author) is that if you have the same algorithm across Services (which means across languages as well) that has a deterministic implementation, and also ensures that every trace sampled at a lower rate will be sampled at a higher rate, then you can achieve that.

traceId1 = "..." // sampled at 0.2 rate traceId2 = "..." // sampled at 0.4 rate, but not at 0.2 traceId3 = "..." // not sampled at 0.4 rate

ServiceASampler = new TraceIdBasedSampler(0.2):

traceId	result
1	true
2	false
3	false

ServiceBSampler = new TraceIdBasedSampler(0.4):

traceId	result
1	true
2	true
3	false

Feb 09 '21 18:02 bogdandrutu

See https://github.com/open-telemetry/oteps/pull/148

Mar 15 '21 21:03 jmacd

The algorithm that I've seen (I think @jmacd was one of the author)

FYI for the record this wasn't my doing. This "inflationary" sampling technique predated me on that project. 😀

May 26 '21 05:05 jmacd

The Go implementation of this algorithm has lead to an incompatibility Amazon XRay Trace IDs. The first 4 bytes of XRay Trace IDs are time based and the Go ratio sampler expects these bytes to be random.

It would be advantageous to the Go SIG, and likely others, if we could resolve this issue. That way, when we change our algorithm we will only need to do so once.

Sep 12 '22 16:09 MrAlias

@Aneurysm9 had mentioned we switch to "hashing" the middle part of the trace ID to make a sampling decision.

Sep 12 '22 16:09 MrAlias

Current Go implementation for reference:

func (ts traceIDRatioSampler) ShouldSample(p SamplingParameters) SamplingResult {
	psc := trace.SpanContextFromContext(p.ParentContext)
	x := binary.BigEndian.Uint64(p.TraceID[0:8]) >> 1
	if x < ts.traceIDUpperBound {
		return SamplingResult{
			Decision:   RecordAndSample,
			Tracestate: psc.TraceState(),
		}
	}
	return SamplingResult{
		Decision:   Drop,
		Tracestate: psc.TraceState(),
	}
}

Sep 12 '22 16:09 MrAlias

@MrAlias that is why the w3c specification is actually moving to make it explicit which bytes should be random. You can see the draft here https://github.com/w3c/trace-context/blob/main/spec/20-http_request_header_format.md#trace-id

The reason we used 7 and not 8 bytes is that the 8th byte contains the sign bit. This is the same reason go has the method to generate a 63 bit random number.

Sep 13 '22 12:09 dyladan

@MrAlias that is why the w3c specification is actually moving to make it explicit which bytes should be random. You can see the draft here https://github.com/w3c/trace-context/blob/main/spec/20-http_request_header_format.md#trace-id

The reason we used 7 and not 8 bytes is that the 8th byte contains the sign bit. This is the same reason go has the method to generate a 63 bit random number.

Ah, super helpful! Thanks 🙏

Sep 13 '22 20:09 MrAlias

Is the plan for OTel to just adopt this ^ when it lands?

Sep 13 '22 20:09 MrAlias

@MrAlias that is why the w3c specification is actually moving to make it explicit which bytes should be random. You can see the draft here https://github.com/w3c/trace-context/blob/main/spec/20-http_request_header_format.md#trace-id

The reason we used 7 and not 8 bytes is that the 8th byte contains the sign bit. This is the same reason go has the method to generate a 63 bit random number.

@dyladan does this mean that the AWS XRay TraceID which uses a non psudo-random value for the left-most 4 bytes (based on the time) would not be W3C complaint?

Looking into switching the Go implementation to the mentioned algorithm, I think the XRay spans would continue to be sampled in a non-random manner given this static prefix.

cc @Aneurysm9

Sep 13 '22 20:09 MrAlias

Trace IDs are 16 bytes, so there should be no issue.

|  0 1 2 3 | 4 5 6 7 | 8 | 9 a b c d e f |
      A         B      C         D

X-Ray IDs will have non-random data in region A (bytes 0-3) and random data in regions B, C, and D (bytes 4-f). The w3c proposal is to guarantee that region D (bytes 9-f) contain random data. The current implementation uses regions A and B, shifted right by one place so it is effectively bytes 0-6:

x := binary.BigEndian.Uint64(p.TraceID[0:8]) >> 1

Instead, we could use region D directly:

x := binary.BigEndian.Uint64(p.TraceID[9:])

This would also make the sampler safe to use with 64-bit trace IDs still generated by some legacy systems.

Sep 14 '22 18:09 Aneurysm9

I am expecting us to use the W3C trace context "random" flag to address this issue: https://github.com/w3c/trace-context/pull/474 @dyladan can you provide guidance?

Sep 14 '22 18:09 jmacd

We spoke about this at the w3c meeting yesterday actually. I was going to bring it up at the next maintainers meeting. The level 2 spec for trace context was delayed by some extended summer vacations but is about to go into wide review for publication as a recommendation. Obviously until something is an official recommendation the working group can't guarantee anything, but we do not expect any major changes. Here are the important points:

The current spec requires that the rightmost 7 bytes MUST be random if the random bit is set. 7 Was chosen because 8 would have included the sign bit for some methods of generating random numbers.
If the random bit is not set, there are no guarantees.
The exact number of bytes is not expected to change, but is not guaranteed until the wide review is complete. During the review it is conceivable that someone challenges the number we have chosen (either to say 7 is too much and wastes resources of a RNG or to say that it is not enough to cover some usecase). The farther right in the ID you go, the "safer" the assumption is that the random bit will apply to the bits/bytes in question.
The "rightmost" semantic is extremely unlikely to change due to the prior art of some tracing systems using the left bits for nonrandom data, and 0-padding short trace ids is done on the left side.

@dyladan does this mean that the AWS XRay TraceID which uses a non psudo-random value for the left-most 4 bytes (based on the time) would not be W3C complaint?

No. The randomness requirement only applies to traceparents where the random flag is set to 1 for backwards compatibility reasons. Also, the nonrandom bits for xray are the leftmost bits and the flag only applies to the rightmost bits. ~~I am not sure how the rest of the ID is generated~~.

edit: Missed that @Aneurysm9 clarified the region in question is random.

@dyladan can you provide guidance?

I would feel safe using the rightmost 7 bytes as my random number, and the fewer rightmost bytes that are used the safer I would feel. Hashing the random part of the ID or the whole ID would be another way to guarantee safety, but also comes at the cost of implementation complexity (and ensuring all implementations are the same).

I would probably recommend to restrict to inverse power of 2 sampling probabilities (1/2, 1/4, 1/8, etc) which would allow you to use the minimum number of bits without hashing. For example, a 50% sampling rate only needs the single rightmost bit, a 25% sampling rate needs only the rightmost 2 bits, etc. This has also been discussed to have other benefits with respect to @jmacd's and @oertl's probability propagation proposal.

edit: alternative to power of 2 restriction would be restricting probability to a whole number percent. Only 7 bits are required to represent every whole number from 1 to 99

Sep 14 '22 18:09 dyladan

@dyladan Thank you. I am pleased to see that we are nearly ready to adopt the random bit for w3c traceparent. Those familiar with the current tracestate-based proposal for probability sampling may remember that we would be able to eliminate the r variable once we had a consistent random source. Meanwhile, the p variable would still compactly propagate the power-of-two sampling probabilities (for use in span-to-metrics pipelines), but that if we had "plenty" of random bits available we could easily extend that specification to accommodate propagating non-powers-of-two sampling probabilities.

If we have 7-bytes of randomness, it means we can agree to a consistent method to evaluate TraceIDRatioBased sampling policy like the OTel-Go example in https://github.com/open-telemetry/opentelemetry-specification/issues/1413#issuecomment-1244007561.

Sep 16 '22 20:09 jmacd

p value propagation I hope is next in level 3. for now it was agreed to leave it out of the level 2 spec because the r value was less controversial and less likely to get held up in wider review

Sep 16 '22 20:09 dyladan

We discussed this in the 9/22 Sampling SIG. The action items were loosely discussed and will continue in the next SIG (10/6).

My opinions, roughly, are:

Assuming W3C gives us 56 bits of randomness, compute a sampling threshold T = S * 2^57 in the range [0, 2^57].
When a W3C tracecontext w/o the new random flag is sampled, SDKs should use an unspecified hashing algorithm on the TraceID to construct 56 questionably-random bits. They should not expect consistent sampling, in this case, and should restrict usage to root-only sampling.
Take the 56 bits of randomness and construct a 56-bit number R in a specified, consistent manner, return the sampling decision if R < T.

The same decision would be returned by a Consistent Probability Sampler as in the experimental specification here: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md.

Moreover, the current experimental specification can be updated to rely on the W3C randomness bit, which is a huge improvement for us (thank you W3C TraceContext group, thank you @dyladan!) as follows:

R-value is no-longer needed, can be dropped (assuming we do not support probability sampling for TraceIDs without the random flag). The smallest sampling probability supported becomes 2^-57
P-value would be used when the threshold T equals a power-of-two (i.e., has one bit set)
a new T-value would be used when the threshold T is not a power-of-two, encoded presumably in hexadecimal. To express a sampling rate like 3-in-4, we want a threshold like T=c0000000000000 (which makes me want to drop trailing zeros of the threshold, so 3-in-4 could be encoded as just T=c).

cc: @oertl @spencerwilson @PeterF778 @kalyanaj @dyladan

Sep 22 '22 16:09 jmacd

Thanks for the update. @jmacd mind sharing what the reasoning was behind using 56 bits? Is it just because that is the number the W3C already has in the draft spec or was there some need for sampling thresholds with that level of randomness?

Sep 22 '22 16:09 dyladan

I do not believe anyone requires 56-bits of sampling precision. I'm interested in what others think is a good value, maybe 16 or 20 bits will do.

Sep 22 '22 16:09 jmacd

@jmacd

I do not believe anyone requires 56-bits of sampling precision. I'm interested in what others think is a good value, maybe 16 or 20 bits will do.

I am not sure if 16 bits are enough if really arbitrary sample rates should be supported. With 16 bits, the smallest possible sampling rate would be 1/2^16 = 0.00001525878 and the second smallest possible sampling rate would be 2/2^16 = 0.00003051757. There is a large relative gap between these two sampling rates.

Sep 23 '22 06:09 oertl

@jmacd

When a W3C tracecontext w/o the new random flag is sampled, SDKs should use an unspecified hashing algorithm on the TraceID to construct 56 questionably-random bits. They should not expect consistent sampling, in this case, and should restrict usage to root-only sampling.

If the hashing algorithm is not specified, the SDKs can simply use any random number. Using the trace ID is not very useful in this case.

Sep 23 '22 06:09 oertl

For an assessment on the lower end of needed probabilities, let's look at one example. Google handles about 100 billion requests daily. If we design long term storage for traces which decreases traces cardinality as the data gets older, and keeps only 1000 traces per day (for example, for data older than 5 years), we need probability of about 2^-28. So, I'd say, we need at least 31 random bits, and this is still playing with chances.

Sep 30 '22 16:09 PeterF778

When a W3C tracecontext w/o the new random flag is sampled, SDKs should use an unspecified hashing algorithm on the TraceID to construct 56 questionably-random bits.

When the new random flag is NOT set, couldn't we still require SDKs to use the SAME hashing algorithm (instead of an unspecified algorithm)? i.e., a best effort in treating that the same last set of bytes in traceID is random...

Reasoning: My understanding is that the TraceID generated by many systems today, though not required by the Level 1 of the W3C TraceContext spec, have their rightmost bytes randomly generated. So, shouldn't we do a "best effort" consistent probability sampling as adoption of the new flag can take a while (W3C TraceContext level 2 spec has to get to recommendation stage, implementations have to adopt it etc.).?

Oct 06 '22 02:10 kalyanaj

This was discussed by me @oertl @PeterF778 @kalyanaj and @kentquirk in the Sampling SIG today. Notes:

The W3C random flag is anticipated to go ahead, but we also expect it could be a year-or-so before it is deployed in OTel SDKs.

We debated whether 56-bits, 48-bits, or 32-bits of randomness would be preferred. There is not a strong preference between 48 and 56, but we think 32 bits is not sufficient.

There was a brief question of whether we might wish to reserve some (e.g., 8) bits of the TraceID for future/alternative use on the assumption that 16-bytes is more than sufficient for global uniqueness (provided 48 bits are truly-random bits). For example, we could directly encode today's powers-of-two-sampling r-value using 6 bits of the TraceID, which drastically reduces the amount of randomness required per trace for consistent probability sampling. (Why: the step to generate r-value uses 2 (expected) random bits per TraceID.). On the other hand, with W3C support we could simply append extra bytes to the traceparent header to reduce the cost of randomness by a logarithmic factor.

We discussed hashing approaches and were reminded why we don't like them (they're expensive, faulty, and not portable, see https://github.com/rurban/smhasher).

We discussed how to test for suitably random TraceIDs, it could follow this previous work: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md#appendix-statistical-test-requirements

We arrived at the following recommendation to address this issue. Assume that the least-significant 7 bytes of the TraceID are random _as though the anticipated W3C random flag were set. Thus, we will generate a 56-bit number and follow exactly the recommendation made by @Aneurysm9, i.e.,

This is compatible with X-ray and we believe it is compatible with all existing OTel SDKs. The work remaining for this issue, if the proposal is accepted, will be to update the Trace SDK specification with details. The TraceIDRatioBasedSampler.ShouldSample() logic uses

traceThreshold := uint64(samplingProbability * (1 << 57))

func ShouldSample() bool {
   traceValue := binary.BigEndian.Uint64(p.TraceID[9:])
   return traceValue < traceThreshold
}

The group reasons that this is no worse than doing nothing at all, and assuming that the W3C proposal does not change this is also forward-compatible.

This traceThreshold shown above can be used to convey non-power-of-two adjusted counts. They could be propagated (as described in my comment above) using a tracestate t-value with an update to https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md.

Oct 06 '22 17:10 jmacd

During our group meeting on Oct 20, @PeterF778, @kalyanaj, @spencerwilson and @kentquirk identified some use cases which are easier to handle if sampling decisions are based on r-value generated independently from the trace-id. One is with tracking user sessions and the other is with linked traces.

Assuming non-instrumented browser, user sessions involve several requests to some backend, each generating a new trace. It is beneficial to keep all these traces consistently sampled. This can be achieved by generating the r-value for the first request as usual, but reusing it for all traces belonging to the same session. This requires a mapping from the session-id to the r-value, which can be technically challenging, but should be feasible.

Linked traces can be used in a number of ways, but one typical use case is when one trace leaves a message in queue to be picked up by another trace hours or days later. The root span of the consuming trace links itself to the producing trace. Again, it is beneficial to make the same sampling decisions for both traces. This can be helped by the consuming trace cloning the r-value from the producing trace.

Oct 21 '22 15:10 PeterF778

opentelemetry-specification opentelemetry-specification copied to clipboard

Review approach & specify algorithm for TraceIdRatioBasedSampler (ProbabilitySampler)

opentelemetry-specification
opentelemetry-specification copied to clipboard