node
node copied to clipboard
lib: rewrite AsyncLocalStorage without async_hooks
I'm working on rewriting AsyncLocalStorage to use v8::SetContinuationPreservedEmbedderData(...) in a similar model to Cloudflare and in anticipation of the upcoming AsyncContext proposal.
This is not done yet but is mostly working as-is. I have not got to benchmarking or writing tests for it yet, but it seems to be passing the existing AsyncLocalStorage tests for me. 😄
I'll be on vacation next week so this may sit for a bit, but feel free to review what I have and leave feedback on the approach.
cc @nodejs/diagnostics @nodejs/performance
Review requested:
- [ ] @nodejs/gyp
- [ ] @nodejs/startup
Also worth considering that as this no longer depends on async_hooks
in any way it might make sense to give it a separate module name at some point too and just have async_hooks
re-export it.
Some additional notes:
- ~This will have to be behind a build flag because of a conflict with Chromium using the same single-tenant API in Electron.~
- ~I have not yet wired this up to AsyncWrap. It will need some linkage there to actually work across real async boundaries.~
- ~current() and exchange() are exposed on the C++ side to make this straightforward but may do some scope helpers for that. We'll see if it's necessary.~
- ~I also need to make a PR to V8 to make it propagate embedder data through thenables.~
- There are probably still lots of bugs and the existing tests don't have great coverage. I'll improve those while I'm working on this as I want to be sure we're not breaking any edge cases.
Woo! Thanks for getting this started. The need for the flag is unfortunate but definitely required given how chromium is using the API and the conflict that causes with electron. But otherwise this should be a major improvement over the current state. Will do a thorough review this weekend.
Another note:
This currently uses a model of building a linked-list of AsyncContextFrames which each contain only the key and value they were constructed with. The alternative which is what was originally proposed was to use a map which is cloned from the parent frame at each frame construction and then the single additional entry for that frame is added but it is considered otherwise immutable.
I went with the current model mostly because it was easier and made debugging simpler but I plan to spend some time benchmarking the two approaches to identify the trade-offs and figure out which approach would be generally better. The one definite downside to the current model is it holds past values for the same store from further up the call graph alive longer which may be undesirable but may only really apply to top-level scopes that will need to be kept alive anyway.
Further analysis should provide some better insight into the behaviour.
This is awesome!
Also worth considering that as this no longer depends on async_hooks in any way it might make sense to give it a separate module name at some point too and just have async_hooks re-export it.
That'd be a welcome change. We've been trying to eliminate async_hooks usage in our apps (as we've detected performance issues when async_hooks are misused), and having to import AsyncLocalStorage from the async_hooks
module messes with how we detect if an app is using async_hooks or not.
Thenables propagation fix is here: https://chromium-review.googlesource.com/c/v8/v8/+/4674242
I propose tagging this as semver-major.
@mcollina Could do that. Though the way in which we land it is still not entirely decided. Due to the Electron conflict it needs build-flagging anyway, so this could be landed as a minor if it's by default turned off. It could land off by default and then have a follow-up to turn it on by default as a major. What do you think?
That works
I've got all but ~four~ two of the existing tests passing at this point:
- ~thenables~ I have a fix for this here.
-
http.Agent
, which I haven't yet figured out why my existingAsyncResource
patches don't cover it. - ~
storage.disable()
, which currently only blocks propagation after the call but also needs to block for the rest of the tick before the call too in order to pass the existing tests.~ -
unhandledRejection
, which does not match the context of the promise. James said he'll point me at how Cloudflare did it.
Hi, @Qard, Can you explain how the API v8::SetContinuationPreservedEmbedderData(...)
work ? I have read some docs and code, but I don't understand it very well, thank you ! It looks a bit like SetEmbedderData/GetEmbedderData
API.
The SetContinuationPreservedEmbedderData
API stores a v8::Value
in the v8::Context
. Whenever a promise reaction job is created (meaning when it resolves or rejects) it captures the value held at that time and stores it on that reaction job. Whenever the reaction job runs (meaning a then/catch/finally continuation runs) it restores that value as the current stored value and then reverts to the previous value when the job is done.
This is essentially what async_hooks
does currently with executionAsyncResource
but exclusively for promises. However we can also manipulate that current value externally to capture and restore the value around AsyncWrap
, AsyncResource
, and any other scope we need to manipulate to match our execution flow.
I've got all the tests passing locally. Moved out of draft so I can run the full CI. I haven't done any benchmarking yet and I know there is for sure a few things I can do to improve performance. I wanted to go for test-passing first so I'd have a functional baseline to work against for performance tuning and ensuring it stays within the boundaries of correctness while I think about improvements.
I also will need to figure out how we're going to test this as if I switch the flag to off by default it won't be covered by the base CI. Any @nodejs/build folks that could help me out with getting a CI job to run the appropriate tests with ./configure --with-native-als
?
CI: https://ci.nodejs.org/job/node-test-pull-request/52961/
CI: https://ci.nodejs.org/job/node-test-pull-request/52967/
CI: https://ci.nodejs.org/job/node-test-pull-request/52971/
CI: https://ci.nodejs.org/job/node-test-pull-request/52973/
CI: https://ci.nodejs.org/job/node-test-pull-request/52980/
Looks like everything in full CI is working on at least one of those attempts. Not going to bother trying to outrun the flakes for a full green run before I finish the benchmarking work. Just wanted to verify with reasonable certainty that everything should be working everywhere. I'll get on to the benchmarking stuff this week and aim to hopefully have something ready to land by the end of the week. 🙂
I deleted my prior comment with benchmark results. Turns out my build config was getting cached and I was measuring with the wrong build flag state and thus using the old async_hooks path. 🤦🏻
Here's a new benchmark:
[00:12:17|% 100| 5/5 files | 60/60 runs | 1/1 configs]: Done
confidence improvement accuracy (*) (**) (***)
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=10 *** -37.49 % ±2.30% ±3.05% ±3.98%
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=100 *** -24.54 % ±2.14% ±2.85% ±3.73%
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=1000 *** -26.05 % ±2.06% ±2.75% ±3.58%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=1 *** 13.66 % ±2.32% ±3.10% ±4.06%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=10 *** -44.44 % ±1.86% ±2.48% ±3.26%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=100 *** -94.29 % ±1.85% ±2.50% ±3.32%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=0 *** -8.31 % ±2.02% ±2.68% ±3.50%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=1 *** 94.22 % ±5.73% ±7.68% ±10.11%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=10 *** 261.74 % ±6.34% ±8.50% ±11.19%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=100 *** 1604.61 % ±24.98% ±33.66% ±44.67%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=0 0.29 % ±2.31% ±3.07% ±4.00%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=1 *** 192.86 % ±5.61% ±7.51% ±9.86%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=10 *** 772.37 % ±34.02% ±45.85% ±60.87%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=100 *** 13322.28 % ±203.33% ±274.03% ±363.80%
async_hooks/async-local-storage-run.js n=10000000 *** -81.11 % ±0.54% ±0.72% ±0.96%
Be aware that when doing many comparisons the risk of a false-positive result increases.
In this case, there are 15 comparisons, you can thus expect the following amount of false-positive results:
0.75 false positives, when considering a 5% risk acceptance (*, **, ***),
0.15 false positives, when considering a 1% risk acceptance (**, ***),
0.01 false positives, when considering a 0.1% risk acceptance (***)
Much better promise benchmark numbers. Some drops in other places though, which largely all appear to be due to native barrier transitions around getting or setting the context frame. I'm looking into the idea of making TurboFan builtins for the get and set functionality and using those as that should be a lot faster. 🤔
I'm actually not too worried about the benchmarks that drop though as most of those are measuring things that don't actually happen as often in real-world use, but I would still like to aim for improvements across the board though. Regardless though, that 130x performance boost on promise transitions is nice. 🤩
I think a -94% would be pretty problematic. I'm not sure the benefits of faster promises would outweight this.
I think the slowdown of getStore
is understandable as the original implementation just calls JS code
I think that if we can measure the difference in ms (or us) it will make it clearer if the impact is really relevant.
I'm looking into the idea of making TurboFan builtins for the get and set functionality and using those as that should be a lot faster. 🤔
Do these perform any allocations? we might be able to use fast api calls.
I'm not sure if they fit within the GC rules of fast-calls. They conceptually should not be allocating, but I'm unclear if the assignment or extraction from the native context slot is allowed. My understanding of fast-calls from my last look was that they can't interact with any Local<...>
, but perhaps that has changed or I misunderstood in the first place?
I've added a commit which includes the addition of TurboFan builtins in V8. I'll have to get a change for that upstreamed before landing this, but just pushing to show the changes and to share how this has improved the benchmarks. Here's the latest numbers with this change:
confidence improvement accuracy (*) (**) (***)
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=10 *** 34.25 % ±4.96% ±6.63% ±8.67%
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=100 *** 15.35 % ±2.34% ±3.12% ±4.07%
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=1000 ** 4.78 % ±2.86% ±3.82% ±5.00%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=1 *** 25.78 % ±6.02% ±8.08% ±10.65%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=10 *** 27.07 % ±2.29% ±3.04% ±3.96%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=100 *** 11.66 % ±2.10% ±2.79% ±3.64%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=0 6.70 % ±7.83% ±10.49% ±13.82%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=1 *** 139.27 % ±6.61% ±8.84% ±11.62%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=10 *** 267.42 % ±17.67% ±23.75% ±31.41%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=100 *** 1677.81 % ±50.93% ±68.60% ±91.00%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=0 -2.10 % ±4.92% ±6.60% ±8.70%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=1 *** 192.00 % ±4.08% ±5.49% ±7.26%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=10 *** 811.76 % ±10.74% ±14.48% ±19.22%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=100 *** 14474.30 % ±236.50% ±318.73% ±423.15%
async_hooks/async-local-storage-run.js n=10000000 *** 45.79 % ±1.20% ±1.62% ±2.14%
Be aware that when doing many comparisons the risk of a false-positive result increases.
In this case, there are 15 comparisons, you can thus expect the following amount of false-positive results:
0.75 false positives, when considering a 5% risk acceptance (*, **, ***),
0.15 false positives, when considering a 1% risk acceptance (**, ***),
0.01 false positives, when considering a 0.1% risk acceptance (***)
This also makes most of the C++ code irrelevant so I will be deleting a bunch of that too. I'll do some more cleanup after I get the V8 change submitted. Just wanted to share where the benchmarks are at now. 😄
There's one final remaining issue which is context leaking. The parallel/test-async-local-storage-contexts.js
test is testing that each context gets its own separate AsyncLocalStorage graph even if the constructor or instance has been shared from another context. I'm not sure yet exactly how to solve this. 🤔
In the current model it will acquire the get/set instrinsics from the context that ran the initial code defining the AsyncContextFrame
class. This value can then be transferred to another another context but will remain connected to the context where the class was defined. Anyone have ideas on how to deal with that? 🤔
I'm not sure yet exactly how to solve this. 🤔
just a wild guess but do the tf builtins need to load the context from hsi like how GetEnteredOrMicrotaskContext
and MicrotaskQueue
work? you could reuse this code to read the current one from the vector https://source.chromium.org/chromium/chromium/src/+/main:v8/src/builtins/builtins-microtask-queue-gen.cc;l=384-412;drc=d708cc843536cc344f0f7679a978634cc14d6153;bpv=0;bpt=1
I've now got the builtins working against active context rather than creation context. With that all existing tests are now passing. Final benchmark results are:
confidence improvement accuracy (*) (**) (***)
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=10 *** 37.01 % ±5.19% ±6.91% ±9.00%
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=100 *** 19.49 % ±2.97% ±3.96% ±5.15%
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=1000 *** 7.55 % ±3.28% ±4.36% ±5.68%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=1 *** 25.97 % ±2.66% ±3.54% ±4.62%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=10 *** 28.12 % ±2.37% ±3.16% ±4.13%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=100 *** 10.92 % ±2.90% ±3.86% ±5.03%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=0 *** 11.72 % ±2.49% ±3.32% ±4.33%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=1 *** 137.10 % ±3.89% ±5.20% ±6.81%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=10 *** 263.46 % ±5.99% ±8.07% ±10.70%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=100 *** 1650.06 % ±41.27% ±55.62% ±73.83%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=0 0.16 % ±2.10% ±2.80% ±3.65%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=1 *** 195.33 % ±3.87% ±5.21% ±6.90%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=10 *** 807.12 % ±11.39% ±15.35% ±20.38%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=100 *** 14494.18 % ±190.06% ±256.14% ±340.05%
async_hooks/async-local-storage-run.js n=10000000 *** 41.39 % ±1.67% ±2.22% ±2.89%
Be aware that when doing many comparisons the risk of a false-positive result increases.
In this case, there are 15 comparisons, you can thus expect the following amount of false-positive results:
0.75 false positives, when considering a 5% risk acceptance (*, **, ***),
0.15 false positives, when considering a 1% risk acceptance (**, ***),
0.01 false positives, when considering a 0.1% risk acceptance (***)
I'll get started on a CR to add the builtins to V8 and will update this again once we can get the changes for that and thenables support backported.
I've submitted a V8 CR to add the TurboFan builtins for the get/set functionality. I'm also still waiting for reviews on the thenables fix. Feel free to review/comment on either. Once those both land I'll get them backported to Node.js and then update this PR. 🎉