dd-trace-java icon indicating copy to clipboard operation
dd-trace-java copied to clipboard

Optimize UTF-8 encoding by avoiding String.getBytes allocations

Open ygree opened this issue 2 years ago • 6 comments

What Does This Do

Avoid allocations on String.getBytes by using pre-allocated buffers.

  • [ ] Apply this optimization to JDK8 only

Before Optimization JDK8

Benchmark                                       Mode  Cnt     Score     Error  Units
TracerMapperMap.mapEnrichedTracesV4            thrpt    5  1694.227 ± 155.292  ops/s
TracerMapperMap.mapEnrichedTracesV5            thrpt    5  4549.567 ± 638.028  ops/s
TracerMapperMap.mapEnrichedTracesWithOriginV4  thrpt    5  1520.952 ± 116.112  ops/s
TracerMapperMap.mapEnrichedTracesWithOriginV5  thrpt    5  4596.798 ±  90.008  ops/s
TracerMapperMap.mapTracesV4                    thrpt    5  3137.220 ± 189.199  ops/s
TracerMapperMap.mapTracesV5                    thrpt    5  8020.531 ± 553.607  ops/s
TracerMapperMap.mapTracesWithOriginV4          thrpt    5  2638.157 ±  83.134  ops/s
TracerMapperMap.mapTracesWithOriginV5          thrpt    5  6739.347 ± 145.361  ops/s
Screen Shot 2022-09-12 at 17 57 59

After Optimization JDK8

Benchmark                                       Mode  Cnt     Score     Error  Units
TracerMapperMap.mapEnrichedTracesV4            thrpt    5  2695.458 ±  51.168  ops/s
TracerMapperMap.mapEnrichedTracesV5            thrpt    5  5116.006 ± 449.946  ops/s
TracerMapperMap.mapEnrichedTracesWithOriginV4  thrpt    5  2457.538 ±  14.304  ops/s
TracerMapperMap.mapEnrichedTracesWithOriginV5  thrpt    5  4759.705 ± 285.102  ops/s
TracerMapperMap.mapTracesV4                    thrpt    5  3956.454 ±  77.176  ops/s
TracerMapperMap.mapTracesV5                    thrpt    5  8149.240 ±  85.705  ops/s
TracerMapperMap.mapTracesWithOriginV4          thrpt    5  3679.660 ±  37.950  ops/s
TracerMapperMap.mapTracesWithOriginV5          thrpt    5  7005.270 ± 719.345  ops/s
Screen Shot 2022-09-12 at 17 58 22

This optimization only makes sense with JDK8, JDK11 performs even better without this optimization

Before Optimization JDK11

Benchmark                                       Mode  Cnt     Score     Error  Units
TracerMapperMap.mapEnrichedTracesV4            thrpt    5  3010.175 ± 139.760  ops/s
TracerMapperMap.mapEnrichedTracesV5            thrpt    5  4892.878 ± 138.931  ops/s
TracerMapperMap.mapEnrichedTracesWithOriginV4  thrpt    5  2904.681 ±  65.676  ops/s
TracerMapperMap.mapEnrichedTracesWithOriginV5  thrpt    5  4128.066 ± 323.213  ops/s
TracerMapperMap.mapTracesV4                    thrpt    5  4740.278 ± 463.387  ops/s
TracerMapperMap.mapTracesV5                    thrpt    5  7678.034 ± 324.922  ops/s
TracerMapperMap.mapTracesWithOriginV4          thrpt    5  4353.736 ±  83.136  ops/s
TracerMapperMap.mapTracesWithOriginV5          thrpt    5  6786.185 ± 207.185  ops/s

After Optimization JDK11

Benchmark                                       Mode  Cnt     Score     Error  Units
TracerMapperMap.mapEnrichedTracesV4            thrpt    5  1869.690 ± 185.908  ops/s
TracerMapperMap.mapEnrichedTracesV5            thrpt    5  4671.995 ±  79.868  ops/s
TracerMapperMap.mapEnrichedTracesWithOriginV4  thrpt    5  2222.731 ±  61.259  ops/s
TracerMapperMap.mapEnrichedTracesWithOriginV5  thrpt    5  4354.938 ± 245.457  ops/s
TracerMapperMap.mapTracesV4                    thrpt    5  3698.241 ± 124.641  ops/s
TracerMapperMap.mapTracesV5                    thrpt    5  7775.267 ± 166.205  ops/s
TracerMapperMap.mapTracesWithOriginV4          thrpt    5  2776.166 ± 190.427  ops/s
TracerMapperMap.mapTracesWithOriginV5          thrpt    5  6863.852 ± 143.374  ops/s

ygree avatar Sep 13 '22 00:09 ygree

Which JDK version are these numbers for? I seem to recall this only making sense on JDK8 when I wrote the code originally, but if these numbers are actually JDK11 then that's great! If not, please post JDK11+ numbers too.

richardstartin avatar Sep 13 '22 06:09 richardstartin

I'm wondering why we end up there, and there is no EncodingCache used in the writer 🤷🏼‍♂️

bantonsson avatar Sep 13 '22 08:09 bantonsson

I'm wondering why we end up there, and there is no EncodingCache used in the writer 🤷🏼‍♂️

I removed the encoding cache a long time ago because it was counterproductive and unbounded and String.getBytes appeared to be working so well in JDK11 it wasn't worth the bother.

richardstartin avatar Sep 13 '22 08:09 richardstartin

Ah, now I remember. It shouldn't need to be unbounded though, and the same machinery introduced here could live inside the cache. Would make it easier to switch implementations based on the JDK as well.

bantonsson avatar Sep 13 '22 09:09 bantonsson

Which JDK version are these numbers for? I seem to recall this only making sense on JDK8 when I wrote the code originally, but if these numbers are actually JDK11 then that's great! If not, please post JDK11+ numbers too.

These JMH test results are for JDK8. Looks like JDK11 have fixed this issue and this optimization doesn't provide any benefits. Assuming we will keep support JDK8 for a while from now we could apply this optimization for JDK8 only.

ygree avatar Sep 13 '22 17:09 ygree

Ah, now I remember. It shouldn't need to be unbounded though, and the same machinery introduced here could live inside the cache. Would make it easier to switch implementations based on the JDK as well.

The encoding cache was a mistake, UTF8BytesString for all the cacheable or constant strings which can be shared across spans is much better.

richardstartin avatar Sep 13 '22 17:09 richardstartin