dd-trace-java
dd-trace-java copied to clipboard
Optimize UTF-8 encoding by avoiding String.getBytes allocations
What Does This Do
Avoid allocations on String.getBytes by using pre-allocated buffers.
- [ ] Apply this optimization to JDK8 only
Before Optimization JDK8
Benchmark Mode Cnt Score Error Units
TracerMapperMap.mapEnrichedTracesV4 thrpt 5 1694.227 ± 155.292 ops/s
TracerMapperMap.mapEnrichedTracesV5 thrpt 5 4549.567 ± 638.028 ops/s
TracerMapperMap.mapEnrichedTracesWithOriginV4 thrpt 5 1520.952 ± 116.112 ops/s
TracerMapperMap.mapEnrichedTracesWithOriginV5 thrpt 5 4596.798 ± 90.008 ops/s
TracerMapperMap.mapTracesV4 thrpt 5 3137.220 ± 189.199 ops/s
TracerMapperMap.mapTracesV5 thrpt 5 8020.531 ± 553.607 ops/s
TracerMapperMap.mapTracesWithOriginV4 thrpt 5 2638.157 ± 83.134 ops/s
TracerMapperMap.mapTracesWithOriginV5 thrpt 5 6739.347 ± 145.361 ops/s
data:image/s3,"s3://crabby-images/6e799/6e7999d011abbac0d40e1dd2ce0bcdc64f2fcf41" alt="Screen Shot 2022-09-12 at 17 57 59"
After Optimization JDK8
Benchmark Mode Cnt Score Error Units
TracerMapperMap.mapEnrichedTracesV4 thrpt 5 2695.458 ± 51.168 ops/s
TracerMapperMap.mapEnrichedTracesV5 thrpt 5 5116.006 ± 449.946 ops/s
TracerMapperMap.mapEnrichedTracesWithOriginV4 thrpt 5 2457.538 ± 14.304 ops/s
TracerMapperMap.mapEnrichedTracesWithOriginV5 thrpt 5 4759.705 ± 285.102 ops/s
TracerMapperMap.mapTracesV4 thrpt 5 3956.454 ± 77.176 ops/s
TracerMapperMap.mapTracesV5 thrpt 5 8149.240 ± 85.705 ops/s
TracerMapperMap.mapTracesWithOriginV4 thrpt 5 3679.660 ± 37.950 ops/s
TracerMapperMap.mapTracesWithOriginV5 thrpt 5 7005.270 ± 719.345 ops/s
data:image/s3,"s3://crabby-images/74f0c/74f0c46a650859e278602ac5f99b690b041575bf" alt="Screen Shot 2022-09-12 at 17 58 22"
This optimization only makes sense with JDK8, JDK11 performs even better without this optimization
Before Optimization JDK11
Benchmark Mode Cnt Score Error Units
TracerMapperMap.mapEnrichedTracesV4 thrpt 5 3010.175 ± 139.760 ops/s
TracerMapperMap.mapEnrichedTracesV5 thrpt 5 4892.878 ± 138.931 ops/s
TracerMapperMap.mapEnrichedTracesWithOriginV4 thrpt 5 2904.681 ± 65.676 ops/s
TracerMapperMap.mapEnrichedTracesWithOriginV5 thrpt 5 4128.066 ± 323.213 ops/s
TracerMapperMap.mapTracesV4 thrpt 5 4740.278 ± 463.387 ops/s
TracerMapperMap.mapTracesV5 thrpt 5 7678.034 ± 324.922 ops/s
TracerMapperMap.mapTracesWithOriginV4 thrpt 5 4353.736 ± 83.136 ops/s
TracerMapperMap.mapTracesWithOriginV5 thrpt 5 6786.185 ± 207.185 ops/s
After Optimization JDK11
Benchmark Mode Cnt Score Error Units
TracerMapperMap.mapEnrichedTracesV4 thrpt 5 1869.690 ± 185.908 ops/s
TracerMapperMap.mapEnrichedTracesV5 thrpt 5 4671.995 ± 79.868 ops/s
TracerMapperMap.mapEnrichedTracesWithOriginV4 thrpt 5 2222.731 ± 61.259 ops/s
TracerMapperMap.mapEnrichedTracesWithOriginV5 thrpt 5 4354.938 ± 245.457 ops/s
TracerMapperMap.mapTracesV4 thrpt 5 3698.241 ± 124.641 ops/s
TracerMapperMap.mapTracesV5 thrpt 5 7775.267 ± 166.205 ops/s
TracerMapperMap.mapTracesWithOriginV4 thrpt 5 2776.166 ± 190.427 ops/s
TracerMapperMap.mapTracesWithOriginV5 thrpt 5 6863.852 ± 143.374 ops/s
Which JDK version are these numbers for? I seem to recall this only making sense on JDK8 when I wrote the code originally, but if these numbers are actually JDK11 then that's great! If not, please post JDK11+ numbers too.
I'm wondering why we end up there, and there is no EncodingCache used in the writer 🤷🏼♂️
I'm wondering why we end up there, and there is no EncodingCache used in the writer 🤷🏼♂️
I removed the encoding cache a long time ago because it was counterproductive and unbounded and String.getBytes
appeared to be working so well in JDK11 it wasn't worth the bother.
Ah, now I remember. It shouldn't need to be unbounded though, and the same machinery introduced here could live inside the cache. Would make it easier to switch implementations based on the JDK as well.
Which JDK version are these numbers for? I seem to recall this only making sense on JDK8 when I wrote the code originally, but if these numbers are actually JDK11 then that's great! If not, please post JDK11+ numbers too.
These JMH test results are for JDK8. Looks like JDK11 have fixed this issue and this optimization doesn't provide any benefits. Assuming we will keep support JDK8 for a while from now we could apply this optimization for JDK8 only.
Ah, now I remember. It shouldn't need to be unbounded though, and the same machinery introduced here could live inside the cache. Would make it easier to switch implementations based on the JDK as well.
The encoding cache was a mistake, UTF8BytesString
for all the cacheable or constant strings which can be shared across spans is much better.