dd-trace-rb icon indicating copy to clipboard operation
dd-trace-rb copied to clipboard

Single Span Sampling

Open marcotc opened this issue 2 years ago • 2 comments

This PR adds support for single span sampling to the tracer.

Single Span Sampling allows you to:

You can configure sampling rule that allow you keep spans despite their
respective traces being dropped by trace-level sampling.

It is configured through the documented environment variables: DD_SPAN_SAMPLING_RULES,ENV_SPAN_SAMPLING_RULES_FILE

All changes in this feature branch have been individually reviewed.

marcotc avatar Jul 05 '22 23:07 marcotc

Codecov Report

Merging #2128 (1abd498) into master (96c395d) will increase coverage by 0.00%. The diff coverage is 97.52%.

@@           Coverage Diff            @@
##           master    #2128    +/-   ##
========================================
  Coverage   97.53%   97.53%            
========================================
  Files        1040     1047     +7     
  Lines       53871    54232   +361     
========================================
+ Hits        52542    52896   +354     
- Misses       1329     1336     +7     
Impacted Files Coverage Δ
lib/datadog/tracing/sampling/rate_sampler.rb 100.00% <ø> (ø)
lib/datadog/tracing/tracer.rb 95.26% <70.00%> (-2.11%) :arrow_down:
lib/datadog/core/configuration/settings.rb 98.85% <85.71%> (-1.15%) :arrow_down:
.../datadog/tracing/sampling/span/rule_parser_spec.rb 98.76% <98.76%> (ø)
lib/datadog/tracing/configuration/ext.rb 100.00% <100.00%> (ø)
lib/datadog/tracing/sampling/rate_limiter.rb 97.26% <100.00%> (+0.07%) :arrow_up:
lib/datadog/tracing/sampling/span/ext.rb 100.00% <100.00%> (ø)
lib/datadog/tracing/sampling/span/matcher.rb 100.00% <100.00%> (ø)
lib/datadog/tracing/sampling/span/rule.rb 100.00% <100.00%> (ø)
lib/datadog/tracing/sampling/span/rule_parser.rb 100.00% <100.00%> (ø)
... and 6 more

:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

codecov-commenter avatar Jul 05 '22 23:07 codecov-commenter

Here are the benchmark results.

Memory usage does have any significant change. The following benchmarks are measuring execution time.

The numbers 1:, 10:, and 100: are the trace size under test, in number of spans.

TraceOperation is kept by trace-level sampling:

1. And no single span sampling is configured (baseline):

This code path does not consult the single span sampler.

                   1:    17532.0 i/s
                  10:     3640.9 i/s - 4.82x  (± 0.00) slower
                 100:      443.4 i/s - 39.54x  (± 0.00) slower

TraceOperation is reject by trace-level sampling:

2. And no single span sampling is configured:

This code path does not consult the single span sampler.

                   1:    20786.5 i/s
                  10:     4217.3 i/s - 4.93x  (± 0.00) slower
                 100:      474.6 i/s - 43.80x  (± 0.00) slower

3. Simple span sampling is configured and all spans are rejected:

The difference between this benchmark and the previous one is the cost to consult single span rules.

                   1:    19565.4 i/s
                  10:     3926.7 i/s - 4.98x  (± 0.00) slower
                 100:      436.3 i/s - 44.85x  (± 0.00) slower

4. Simple span sampling is configured and all spans are kept:

One side effect of being Single Span Sampled is that 3 tags are added to each span successfully being single sampled, thus more overhead is expected.

                   1:    15104.6 i/s
                  10:     3080.9 i/s - 4.90x  (± 0.00) slower
                 100:      365.4 i/s - 41.34x  (± 0.00) slower

Conclusions

The only code path with meaningful performance impact is the 4, and that can be attributed to extra tags being added to each single sampled span, as well as the time it takes to try to match each trace span to the configured rules.

The rules by themselves are not very expensive: the difference between 3 and 2 is effectively the cost to consult single span rules for all spans in a trace. In fact, the performance of the baseline (1) and 3 are very closely matched: this means that "keeping all spans" is just as expensive as "dropping the trace plus consulting single span sampling rules".

Maybe not surprising, but dropped traces (2) are cheaper than sampled traces (1), like because the PrioritySampler and RuleSampler don't have to be consulted. Neither 1 nor 2 have Single Span Sampling configured, thus this is a tracer-level sampling overhead that existed beforehand.

marcotc avatar Aug 08 '22 23:08 marcotc