Improve uuid performance
Each transaction is currently assigned a uuid v4. We use the uuid module which internally uses crypto.randomBytes(16) to generate the uuid.
This is currently one of our biggest performance drains when creating new transactions, so I ran some benchmarks to see if there were any faster alternatives.
Here's the best(🔥) synchronous performers of the benchmark compared our current method(🎯):
| Method | Leaky | Format | Re-use | Cache | Ops/sec | RME |
|---|---|---|---|---|---|---|
| uuid/v1 | 💦 | uuid | n/a | 1,074,518 | ±1.17% | |
| uuid/v1 | 💦 | hex | n/a | 404,325 | ±1.63% | |
| uuid/v1 | 💦 | hex | ✅ | n/a | 545,523 | ±1.14% |
| uuid/v4🎯 | uuid | n/a | 233,282 | ±2.16% | ||
| uuid/v4 | hex | n/a | 207,050 | ±4.65% | ||
| uuid/v4 | hex | ✅ | n/a | 243,580 | ±3.50% | |
| fast-uuid | 💦 | uuid | n/a | 630,719 | ±0.83% | |
| uuid-random🔥 | uuid | 512 | 1,214,445 | ±0.42% | ||
| uuid-random🔥 | hex | 512 | 1,159,229 | ±0.80% | ||
| sodium-uuid | hex | n/a | 214,087 | ±0.51% | ||
| sodium-uuid | hex | ✅ | n/a | 220,409 | ±0.54% | |
| hyperid🔥 | 💦 | custom | n/a | 8,843,573 | ±1.52% | |
| crypto.randomBytes | hex | n/a | 243,894 | ±3.27% | ||
| crypto.randomBytes | hex | 256 | 932,165 | ±1.11% | ||
| crypto.randomBytes | hex | ✅ | 256 | 897,471 | ±0.95% | |
| crypto.randomBytes🔥 | hex | 4096 | 1,221,396 | ±0.64% | ||
| crypto.randomBytes🔥 | hex | ✅ | 4096 | 1,140,619 | ±0.42% | |
| crypto.randomBytes🔥 | hex | 65536 | 1,199,742 | ±4.04% | ||
| crypto.randomBytes🔥 | hex | ✅ | 65536 | 1,163,009 | ±0.63% | |
| crypto.randomFillSync | hex | ✅ | n/a | 309,164 | ±0.52% | |
| crypto.randomFillSync | hex | 256 | 1,053,831 | ±0.77% | ||
| crypto.randomFillSync | hex | ✅ | 256 | 1,012,593 | ±0.52% | |
| crypto.randomFillSync🔥 | hex | 4096 | 1,318,476 | ±0.69% | ||
| crypto.randomFillSync🔥 | hex | ✅ | 4096 | 1,253,235 | ±0.39% | |
| crypto.randomFillSync🔥 | hex | 65536 | 1,294,847 | ±1.67% | ||
| crypto.randomFillSync🔥 | hex | ✅ | 65536 | 1,295,193 | ±1.05% | |
| /dev/random | hex | ✅ | n/a | 228,396 | ±0.34% | |
| /dev/random | hex | ✅ | 256 | 446,458 | ±0.62% | |
| /dev/random | hex | ✅ | 4096 | 482,517 | ±0.89% | |
| /dev/random | hex | ✅ | 65536 | 540,828 | ±0.75% | |
| /dev/urandom | 💦 | hex | ✅ | n/a | 227,162 | ±0.72% |
| /dev/urandom | 💦 | hex | ✅ | 256 | 447,749 | ±0.46% |
| /dev/urandom | 💦 | hex | ✅ | 4096 | 490,723 | ±0.49% |
| /dev/urandom | 💦 | hex | ✅ | 65536 | 542,039 | ±0.87% |
View all the results here and the source code for the benchmark here: https://github.com/watson/uuid-benchmark
- Method: Name of npm module, Node.js core function, or OS based random generator used
- Leaky: Indicates that the method used to generate the 128 bit number doesn't leak metadata
- Format:
uuid- ID is a UUID formatted string representing a 128 bit number, e.g.3a017fc5-4f50-4db9-b0ce-4547ba0a1bfdhex- ID is a hex formatted string representing a 128 bit number, e.g.3a017fc54f504db9b0ce4547ba0a1bfdcustom- ID is not a UUID or 128 bit number, but is still globally unique
- Re-use: Indicates an output buffer was re-used between each test to potentially reduce the number of objects that needed to be created
- Cache:
n/a- Only the amount of bytes required to generate a 128 bit number was read into memory- <Number> - Number of bytes read into memory the first time a 128 bit number was requested. Subsequent runs would use the leftover bytes in the cache until it had been depleted, at which time another chunk of bytes would be read into memory
- Ops/sec: Number of UUID's generated per second
- RME: The relative margin of error (expressed as a percentage of the mean)
Gotcha
The crypto.randomFillSync function was added in Node.js v7.10.0. For older versions crypto.randomBytes can be used instead.
~~I'm not 100% certain that uuid-random is actually secure. The judgement is purely based on what's written in its readme.~~ I read the source code, and it looks fine.
Discussion topics
- Are we ok with leaking some types of metadata to gain speed? E.g. a sequential number
- Are there any privacy/safety issues in using a cache to speed up generating multiple random numbers?
Just updated the table with a few more benchmarks
Since commit 51da76bb6f3ca425 the code no longer uses uuid/v4 as far as I can see. In the current tip of master (f695e142d9444) there appears to instead be just a single call to crypto.randomBytes(16).toString('hex'), which sadly has the same performance profile as uuid/v4. So I suppose this issue is still relevant.
@watson is there a branch with an alternate UUID generator that one can try? We are evaluating Elastic APM and are seeing about 50-70% increase in 95 percentile response times due to APM. If we turn off transaction spans this is 12-20%, but it is still high. We were expecting a 1-3% impact, not this high!
@saurabhnanda, we saw the same thing when we started using this module. I think the most important knob here is transactionSampleRate. Try setting it e.g via env variable ELASTIC_APM_TRANSACTION_SAMPLE_RATE to 0.1.
@tsondergaard unfortunately reducing the sample rate is not something that we'd like to do. We also plan to use Elastic APM for logging all req/res in our production environment, so anything less than 100% (for transactionSampleRate) will force us to use yet another component for logging.
@saurabhnanda, all req/res will be recorded even with a reduced transactionSampleRate. If you set transactionSampleRate to 0.1 you will still have information about all http requests and responses including response codes, you will just only have full traces of 10% of these requests. My initial reaction to the performance of this module was exactly like yours, but I found out that a reduced sampling rate is not such a big deal because you still get all req/res recorded.
@tsondergaard thanks for the tip. Even the docs don't seem to explicitly mention this subtlety about transactionSampleRate.
What config are using in production for the nodejs agent?
@saurabhnanda, the project that I work on that uses the elastic-apm-node module is not in production yet, but the load test simultations run with a sampling rate of 0.1 to keep the overhead so low that it does not significantly skew the results.
@tsondergaard what about captureSpanStackTraces? Do you have that enabled?
@saurabhnanda, yes, so far!
I don't think UUID generation is the cause of the problem you're describing and there's no branch available where the UUID engine have been changed.
Have you checked out the Performance Tuning page in the docs?
This issue is still relevant and I'm seeing 2-2.5x decrease in performance due to crypto.randomFillSync() being called on every transaction, regardless of sampling rate. I have opentelemetryBridgeEnabled as true because I have a custom OTEL instrumentation.