[server][router] Replace synchronized with CAS in TokenBucket::update to reduce contention
[server][router] Replace synchronized with CAS in TokenBucket::update to reduce contention
Just want to run the idea by everyone for feedback. Although based on the experiments this does not fix the latency issue that we're seeing.
How was this PR tested?
CI
Does this PR introduce any user-facing changes?
- [x] No. You can skip the rest of this section.
- [ ] Yes. Make sure to explain your proposed changes and call out the behavior change.
@sushantmane Have you collected the latency spent on this function during the benchmarking?
Feel free to try this microbench with/without your change... but so far, the thinking is that TokenBucket is not a bottleneck...
For low-contention scenarios (few threads), both implementations perform similarly well. As contention increases, the CAS implementation shows slightly better performance and scalability. However, the performance gain of CAS over synchronized is not dramatic.
Benchmark with synchronized
Benchmark (rcuPerSecond) (tokensToConsume) Mode Cnt Score Error Units
TokenBucketBenchmark.tokenBucketWithThreadCount1 1000000000000 1 avgt 3 0.012 ± 0.011 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount1 1000000000000 100 avgt 3 0.012 ± 0.015 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount1 1000000 1 avgt 3 0.050 ± 0.024 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount1 1000000 100 avgt 3 0.045 ± 0.040 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount2 1000000000000 1 avgt 3 0.107 ± 0.010 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount2 1000000000000 100 avgt 3 0.099 ± 0.034 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount2 1000000 1 avgt 3 0.148 ± 0.055 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount2 1000000 100 avgt 3 0.136 ± 0.021 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount4 1000000000000 1 avgt 3 0.389 ± 0.332 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount4 1000000000000 100 avgt 3 0.281 ± 0.217 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount4 1000000 1 avgt 3 0.347 ± 0.106 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount4 1000000 100 avgt 3 0.325 ± 0.045 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount8 1000000000000 1 avgt 3 1.179 ± 0.031 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount8 1000000000000 100 avgt 3 1.081 ± 0.183 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount8 1000000 1 avgt 3 0.784 ± 0.257 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount8 1000000 100 avgt 3 0.809 ± 0.019 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount16 1000000000000 1 avgt 3 2.332 ± 0.552 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount16 1000000000000 100 avgt 3 2.197 ± 0.748 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount16 1000000 1 avgt 3 1.045 ± 0.540 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount16 1000000 100 avgt 3 1.291 ± 0.114 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount32 1000000000000 1 avgt 3 5.976 ± 6.064 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount32 1000000000000 100 avgt 3 4.334 ± 2.897 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount32 1000000 1 avgt 3 3.024 ± 4.573 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount32 1000000 100 avgt 3 2.154 ± 0.571 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount64 1000000000000 1 avgt 3 12.554 ± 2.907 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount64 1000000000000 100 avgt 3 14.865 ± 1.765 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount64 1000000 1 avgt 3 5.123 ± 0.622 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount64 1000000 100 avgt 3 6.732 ± 2.387 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount128 1000000000000 1 avgt 3 23.245 ± 11.758 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount128 1000000000000 100 avgt 3 24.848 ± 28.493 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount128 1000000 1 avgt 3 11.619 ± 2.024 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount128 1000000 100 avgt 3 11.881 ± 3.701 us/op
Benchmark with CAS
Benchmark (rcuPerSecond) (tokensToConsume) Mode Cnt Score Error Units
TokenBucketBenchmark.tokenBucketWithThreadCount1 1000000000000 1 avgt 3 0.012 ± 0.016 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount1 1000000000000 100 avgt 3 0.011 ± 0.004 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount1 1000000 1 avgt 3 0.043 ± 0.004 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount1 1000000 100 avgt 3 0.044 ± 0.004 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount2 1000000000000 1 avgt 3 0.092 ± 0.042 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount2 1000000000000 100 avgt 3 0.119 ± 0.057 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount2 1000000 1 avgt 3 0.134 ± 0.023 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount2 1000000 100 avgt 3 0.147 ± 0.052 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount4 1000000000000 1 avgt 3 0.303 ± 0.138 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount4 1000000000000 100 avgt 3 0.304 ± 0.124 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount4 1000000 1 avgt 3 0.309 ± 0.324 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount4 1000000 100 avgt 3 0.287 ± 0.095 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount8 1000000000000 1 avgt 3 0.770 ± 0.233 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount8 1000000000000 100 avgt 3 0.767 ± 0.062 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount8 1000000 1 avgt 3 0.621 ± 0.109 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount8 1000000 100 avgt 3 0.714 ± 0.119 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount16 1000000000000 1 avgt 3 2.442 ± 0.218 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount16 1000000000000 100 avgt 3 1.952 ± 0.359 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount16 1000000 1 avgt 3 1.302 ± 0.096 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount16 1000000 100 avgt 3 1.118 ± 0.434 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount32 1000000000000 1 avgt 3 6.113 ± 7.613 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount32 1000000000000 100 avgt 3 4.932 ± 1.030 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount32 1000000 1 avgt 3 2.449 ± 0.455 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount32 1000000 100 avgt 3 1.824 ± 0.334 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount64 1000000000000 1 avgt 3 11.493 ± 7.086 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount64 1000000000000 100 avgt 3 9.289 ± 3.053 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount64 1000000 1 avgt 3 4.232 ± 1.228 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount64 1000000 100 avgt 3 5.036 ± 0.383 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount128 1000000000000 1 avgt 3 24.300 ± 2.740 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount128 1000000000000 100 avgt 3 17.554 ± 7.263 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount128 1000000 1 avgt 3 13.262 ± 4.020 us/op
TokenBucketBenchmark.tokenBucketWithThreadCount128 1000000 100 avgt 3 12.398 ± 4.895 us/op
Feel free to try this microbench with/without your change... but so far, the thinking is that TokenBucket is not a bottleneck...
For low-contention scenarios (few threads), both implementations perform similarly well. As contention increases, the CAS implementation shows slightly better performance and scalability. However, the performance gain of CAS over synchronized is not dramatic.
SG. By the way, I put this microbench in PR #1122. We can merge it if you think it's useful.
However, the performance gain of CAS over synchronized is not dramatic.
Hence, closing this PR!