libdatadog icon indicating copy to clipboard operation
libdatadog copied to clipboard

[ddcommon] Add token bucket rate limiter implementation

Open brettlangdon opened this issue 1 year ago • 10 comments
trafficstars

What does this PR do?

Add an implementation of a token bucket rate limiter to ddcommon.

Motivation

Meant as a reference implementation of the rate limiter intended to be used by tracing clients for their sampling implementation.

Additional Notes

Anything else we should know when reviewing?

How to test the change?

Describe here in detail how the change can be validated.

brettlangdon avatar Jul 27 '24 14:07 brettlangdon

Benchmarks

This comment was omitted because it was over 65536 characters.Please check the Gitlab Job logs to see its output.

pr-commenter[bot] avatar Jul 27 '24 14:07 pr-commenter[bot]

Hm. A couple notes:

  • the time resolution on windows is ... rather milliseconds than nanoseconds. Not a fundamental problem, but it makes not much sense to have interval in anything but at least milliseconds, I'd say
  • the time is using system time, which is susceptible to ... system time changes, i.e. not monotonic.

I do already have an implementation of a rate limiter https://github.com/DataDog/libdatadog/blob/d79091ec56a5050f7e0a5053829b5ca5b12a2f71/sidecar/src/shm_limiters.rs I hadn't needed it outside of sidecar yet ... but probably should have pushed it to ddcommon.

That implementation is based on atomics. I very much need atomics as the purpose of the limiter is being able to work across processes via shared memory, something I can't just Mutex. Also my use case involves the ability to change the rate limit on an existing limiter. (i.e. being friendly to that scenario).

Do you want to keep your implementation, or should we rather take my limiter implementation and put the LocalLimiter I have into ddcommon (and possibly the ShmLimiter into ipc crate)?

bwoebi avatar Jul 27 '24 18:07 bwoebi

@bwoebi my original implementation was using clocksource as a way to avoid the issues with SystemTime, that was something I wanted to revisit here.

I don't care too much about whose implementation is taken, but having a shared token bucket rate limiter implementation in ddcommon would be good.

brettlangdon avatar Jul 28 '24 13:07 brettlangdon

That implementation is based on atomics. I very much need atomics as the purpose of the limiter is being able to work across processes via shared memory, something I can't just Mutex.

Is there a reason you need it to work across processes?

For most of the language libraries the sampling rate limiter is per-process.

brettlangdon avatar Jul 28 '24 13:07 brettlangdon

Codecov Report

Attention: Patch coverage is 93.08176% with 11 lines in your changes missing coverage. Please review.

Project coverage is 70.58%. Comparing base (eb62a6d) to head (0329035).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #552      +/-   ##
==========================================
+ Coverage   70.49%   70.58%   +0.08%     
==========================================
  Files         213      214       +1     
  Lines       28412    28588     +176     
==========================================
+ Hits        20030    20179     +149     
- Misses       8382     8409      +27     
Components Coverage Δ
crashtracker 25.21% <ø> (+0.07%) :arrow_up:
datadog-alloc 98.73% <ø> (ø)
data-pipeline 50.00% <ø> (ø)
data-pipeline-ffi 0.00% <ø> (ø)
ddcommon 86.79% <93.08%> (+0.56%) :arrow_up:
ddcommon-ffi 74.58% <ø> (ø)
ddtelemetry 58.95% <ø> (ø)
ipc 84.18% <ø> (ø)
profiling 78.09% <ø> (ø)
profiling-ffi 56.76% <ø> (ø)
serverless 0.00% <ø> (ø)
sidecar 35.42% <ø> (ø)
sidecar-ffi 0.00% <ø> (ø)
spawn-worker 54.98% <ø> (ø)
trace-mini-agent 71.14% <ø> (ø)
trace-normalization 98.24% <ø> (ø)
trace-obfuscation 95.73% <ø> (ø)
trace-protobuf 77.16% <ø> (ø)
trace-utils 91.03% <ø> (-0.60%) :arrow_down:

codecov-commenter avatar Jul 28 '24 14:07 codecov-commenter

@brettlangdon The primary problem is that PHP tends to have many processes in some configurations, with autoscaling the number of processes depending on load, making the sample rate ... unpredictable. Which is why shared rate limiting is needed - you just set it and it's valid for your host, which is a perfectly known quantity.

bwoebi avatar Jul 29 '24 08:07 bwoebi

@brettlangdon The primary problem is that PHP tends to have many processes in some configurations, with autoscaling the number of processes depending on load, making the sample rate ... unpredictable. Which is why shared rate limiting is needed - you just set it and it's valid for your host, which is a perfectly known quantity.

What is the performance overhead of using a shared memory rate limiter vs an in-process one? I'd expect it to be quite heavy?

brettlangdon avatar Jul 29 '24 12:07 brettlangdon

@brettlangdon Why would there be any extra overhead (apart from the initial mmap call during process init)? It's just pointing to some memory.

bwoebi avatar Jul 29 '24 12:07 bwoebi

@brettlangdon Why would there be any extra overhead (apart from the initial mmap call during process init)? It's just pointing to some memory.

Wouldn't we now be creating contention between processes on shared memory updates?

brettlangdon avatar Jul 29 '24 12:07 brettlangdon

That's why the implementation is based on atomics, to avoid any lock contention.

bwoebi avatar Jul 29 '24 12:07 bwoebi

Superseeded by #560.

bwoebi avatar Sep 04 '24 11:09 bwoebi