dd-trace-js icon indicating copy to clipboard operation
dd-trace-js copied to clipboard

speed up shimmer by about 50x

Open bengl opened this issue 1 year ago • 6 comments

The copyProperties function was doing an Object.setPrototypeOf, which is rather costly. We don't actually need this because all it was doing was acting as a stopgap in case we don't get all the properties from the original function. We're using Reflect.ownKeys() to get those properties, so this can't happen, therefore there's no need to set the prototype.

On a benchmark I whipped up separately for another project, I had noticed that shimmer + our instrumentations adds a significant amount of overhead. On a run with 100 iterations, I found that overhead to be about 10000%. When I ran the same benchmark after this change, I saw that overhead to be about 200%. To be fair, this benchmark is not a realistic application loading, and is explicitly clearing the cache and re-loading everything repeatedly, so I'm not sure how impactful this change will be. That said, we'll see what the benchmarks say.

bengl avatar Aug 27 '24 18:08 bengl

Overall package size

Self size: 8.24 MB Deduped: 94.81 MB No deduping: 95.38 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.2.2 | 29.27 MB | 29.27 MB | | @datadog/native-appsec | 8.3.0 | 19.37 MB | 19.38 MB | | @datadog/native-iast-taint-tracking | 3.2.0 | 13.9 MB | 13.91 MB | | @datadog/pprof | 5.4.1 | 9.76 MB | 10.13 MB | | protobufjs | 7.2.5 | 2.77 MB | 5.16 MB | | @datadog/native-iast-rewriter | 2.6.0 | 2.58 MB | 2.72 MB | | @opentelemetry/core | 1.14.0 | 872.87 kB | 1.47 MB | | @datadog/native-metrics | 3.0.1 | 1.06 MB | 1.46 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | import-in-the-middle | 1.11.2 | 112.74 kB | 826.22 kB | | msgpack-lite | 0.1.26 | 201.16 kB | 281.59 kB | | source-map | 0.7.4 | 226 kB | 226 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | lru-cache | 7.18.3 | 133.92 kB | 133.92 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.0 | 109.9 kB | 109.9 kB | | semver | 7.6.3 | 95.82 kB | 95.82 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 5.3.1 | 51.46 kB | 51.46 kB | | int64-buffer | 0.1.10 | 49.18 kB | 49.18 kB | | shell-quote | 1.8.1 | 44.96 kB | 44.96 kB | | istanbul-lib-coverage | 3.2.0 | 29.34 kB | 29.34 kB | | rfdc | 1.3.1 | 25.21 kB | 25.21 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | dc-polyfill | 0.1.4 | 23.1 kB | 23.1 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.3 | 4.47 kB | 4.47 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

github-actions[bot] avatar Aug 27 '24 18:08 github-actions[bot]

Benchmarks

Benchmark execution time: 2024-12-12 18:41:21

Comparing candidate commit 8172d517975f1c2d147f4844694e469996e6671e in PR branch bengl/faster-shimmer with baseline commit de0b516846fb812045367f26773a85717f5fbadd in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 261 metrics, 5 unstable metrics.

pr-commenter[bot] avatar Aug 27 '24 18:08 pr-commenter[bot]

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 92.13%. Comparing base (1f2914f) to head (d0a1c31). Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4633      +/-   ##
==========================================
+ Coverage   86.40%   92.13%   +5.72%     
==========================================
  Files          15      102      +87     
  Lines         581     3292    +2711     
  Branches       33       33              
==========================================
+ Hits          502     3033    +2531     
- Misses         79      259     +180     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 27 '24 18:08 codecov[bot]

I think this was mostly added as a safety net, but I don't know the scenarios where the prototype would not be accessible by getOwnPropertyDescriptors. However, I cannot reproduce this performance gain at all, and running quick benchmark locally result in the exact same timing. My assumption is that there is something specific in your setup where maybe the wrapping stays in place and gets reapplied on top of the previous wrapping or something similar, which would explain the exponential impact of running the benchmark longer. I think it would be worth it to find exactly why this is happening just in case there is some other bug that this either this is fixing or that would be present in the benchmark. In any case, I don't think it's needed, so it should be safe to remove.

rochdev avatar Aug 27 '24 22:08 rochdev

FWIW my benchmark was effectively:

const origRequire = Module.prototype.require

for (let i = 0; i < SOME_BIG_ENOUGH_NUMBER; i++) {
  require('dd-trace/packages/datadog-instrumentations')
  startTimer()
  require('undici')
  stopTime()
  clearCache()
  Module.prototype.require = origRequire
}

When I ran it with 0x, I saw that nearly all the time was being spent in setPrototypeOf.

I don't know the scenarios where the prototype would not be accessible by getOwnPropertyDescriptors.

We don't even need the prototype since we're copying all the ownKeys() and these are all functions. On the off chance that we're wrapping a function whose prototype is not Function.prototype, then we'd have a problem. If you want, I can add a check for that and special-case it. It should be pretty quick.

bengl avatar Aug 28 '24 15:08 bengl

If you want, I can add a check for that and special-case it.

If you don't think it's a valid real-world case I don't really mind. I was more interested to know why rerunning shimmer many times would cause an exponential delay in the benchmark, whereas running the function directly on a new object every time shows no improvement.

rochdev avatar Aug 28 '24 15:08 rochdev