node icon indicating copy to clipboard operation
node copied to clipboard

lib: rewrite AsyncLocalStorage without async_hooks

Open Qard opened this issue 1 year ago • 4 comments

I'm working on rewriting AsyncLocalStorage to use v8::SetContinuationPreservedEmbedderData(...) in a similar model to Cloudflare and in anticipation of the upcoming AsyncContext proposal.

This is not done yet but is mostly working as-is. I have not got to benchmarking or writing tests for it yet, but it seems to be passing the existing AsyncLocalStorage tests for me. 😄

I'll be on vacation next week so this may sit for a bit, but feel free to review what I have and leave feedback on the approach.

cc @nodejs/diagnostics @nodejs/performance

Qard avatar Jun 22 '23 23:06 Qard

Review requested:

  • [ ] @nodejs/gyp
  • [ ] @nodejs/startup

nodejs-github-bot avatar Jun 22 '23 23:06 nodejs-github-bot

Also worth considering that as this no longer depends on async_hooks in any way it might make sense to give it a separate module name at some point too and just have async_hooks re-export it.

Qard avatar Jun 23 '23 00:06 Qard

Some additional notes:

  • ~This will have to be behind a build flag because of a conflict with Chromium using the same single-tenant API in Electron.~
  • ~I have not yet wired this up to AsyncWrap. It will need some linkage there to actually work across real async boundaries.~
    • ~current() and exchange() are exposed on the C++ side to make this straightforward but may do some scope helpers for that. We'll see if it's necessary.~
  • ~I also need to make a PR to V8 to make it propagate embedder data through thenables.~
  • There are probably still lots of bugs and the existing tests don't have great coverage. I'll improve those while I'm working on this as I want to be sure we're not breaking any edge cases.

Qard avatar Jun 23 '23 05:06 Qard

Woo! Thanks for getting this started. The need for the flag is unfortunate but definitely required given how chromium is using the API and the conflict that causes with electron. But otherwise this should be a major improvement over the current state. Will do a thorough review this weekend.

jasnell avatar Jun 23 '23 14:06 jasnell

Another note:

This currently uses a model of building a linked-list of AsyncContextFrames which each contain only the key and value they were constructed with. The alternative which is what was originally proposed was to use a map which is cloned from the parent frame at each frame construction and then the single additional entry for that frame is added but it is considered otherwise immutable.

I went with the current model mostly because it was easier and made debugging simpler but I plan to spend some time benchmarking the two approaches to identify the trade-offs and figure out which approach would be generally better. The one definite downside to the current model is it holds past values for the same store from further up the call graph alive longer which may be undesirable but may only really apply to top-level scopes that will need to be kept alive anyway.

Further analysis should provide some better insight into the behaviour.

Qard avatar Jun 23 '23 19:06 Qard

This is awesome!

Also worth considering that as this no longer depends on async_hooks in any way it might make sense to give it a separate module name at some point too and just have async_hooks re-export it.

That'd be a welcome change. We've been trying to eliminate async_hooks usage in our apps (as we've detected performance issues when async_hooks are misused), and having to import AsyncLocalStorage from the async_hooks module messes with how we detect if an app is using async_hooks or not.

mmarchini avatar Jun 30 '23 00:06 mmarchini

Thenables propagation fix is here: https://chromium-review.googlesource.com/c/v8/v8/+/4674242

Qard avatar Jul 09 '23 05:07 Qard

I propose tagging this as semver-major.

mcollina avatar Jul 09 '23 13:07 mcollina

@mcollina Could do that. Though the way in which we land it is still not entirely decided. Due to the Electron conflict it needs build-flagging anyway, so this could be landed as a minor if it's by default turned off. It could land off by default and then have a follow-up to turn it on by default as a major. What do you think?

Qard avatar Jul 10 '23 00:07 Qard

That works

mcollina avatar Jul 10 '23 05:07 mcollina

I've got all but ~four~ two of the existing tests passing at this point:

  • ~thenables~ I have a fix for this here.
  • http.Agent, which I haven't yet figured out why my existing AsyncResource patches don't cover it.
  • ~storage.disable(), which currently only blocks propagation after the call but also needs to block for the rest of the tick before the call too in order to pass the existing tests.~
  • unhandledRejection, which does not match the context of the promise. James said he'll point me at how Cloudflare did it.

Qard avatar Jul 20 '23 04:07 Qard

Hi, @Qard, Can you explain how the API v8::SetContinuationPreservedEmbedderData(...) work ? I have read some docs and code, but I don't understand it very well, thank you ! It looks a bit like SetEmbedderData/GetEmbedderData API.

theanarkh avatar Jul 21 '23 10:07 theanarkh

The SetContinuationPreservedEmbedderData API stores a v8::Value in the v8::Context. Whenever a promise reaction job is created (meaning when it resolves or rejects) it captures the value held at that time and stores it on that reaction job. Whenever the reaction job runs (meaning a then/catch/finally continuation runs) it restores that value as the current stored value and then reverts to the previous value when the job is done.

This is essentially what async_hooks does currently with executionAsyncResource but exclusively for promises. However we can also manipulate that current value externally to capture and restore the value around AsyncWrap, AsyncResource, and any other scope we need to manipulate to match our execution flow.

Qard avatar Jul 24 '23 21:07 Qard

I've got all the tests passing locally. Moved out of draft so I can run the full CI. I haven't done any benchmarking yet and I know there is for sure a few things I can do to improve performance. I wanted to go for test-passing first so I'd have a functional baseline to work against for performance tuning and ensuring it stays within the boundaries of correctness while I think about improvements.

I also will need to figure out how we're going to test this as if I switch the flag to off by default it won't be covered by the base CI. Any @nodejs/build folks that could help me out with getting a CI job to run the appropriate tests with ./configure --with-native-als?

Qard avatar Jul 28 '23 05:07 Qard

CI: https://ci.nodejs.org/job/node-test-pull-request/52961/

nodejs-github-bot avatar Jul 28 '23 09:07 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/52967/

nodejs-github-bot avatar Jul 28 '23 19:07 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/52971/

nodejs-github-bot avatar Jul 28 '23 23:07 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/52973/

nodejs-github-bot avatar Jul 29 '23 06:07 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/52980/

nodejs-github-bot avatar Jul 29 '23 16:07 nodejs-github-bot

Looks like everything in full CI is working on at least one of those attempts. Not going to bother trying to outrun the flakes for a full green run before I finish the benchmarking work. Just wanted to verify with reasonable certainty that everything should be working everywhere. I'll get on to the benchmarking stuff this week and aim to hopefully have something ready to land by the end of the week. 🙂

Qard avatar Jul 31 '23 06:07 Qard

I deleted my prior comment with benchmark results. Turns out my build config was getting cached and I was measuring with the wrong build flag state and thus using the old async_hooks path. 🤦🏻

Here's a new benchmark:

[00:12:17|% 100| 5/5 files | 60/60 runs | 1/1 configs]: Done
                                                                                         confidence improvement accuracy (*)     (**)    (***)
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=10          ***    -37.49 %       ±2.30%   ±3.05%   ±3.98%
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=100         ***    -24.54 %       ±2.14%   ±2.85%   ±3.73%
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=1000        ***    -26.05 %       ±2.06%   ±2.75%   ±3.58%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=1                  ***     13.66 %       ±2.32%   ±3.10%   ±4.06%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=10                 ***    -44.44 %       ±1.86%   ±2.48%   ±3.26%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=100                ***    -94.29 %       ±1.85%   ±2.50%   ±3.32%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=0               ***     -8.31 %       ±2.02%   ±2.68%   ±3.50%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=1               ***     94.22 %       ±5.73%   ±7.68%  ±10.11%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=10              ***    261.74 %       ±6.34%   ±8.50%  ±11.19%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=100             ***   1604.61 %      ±24.98%  ±33.66%  ±44.67%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=0                            0.29 %       ±2.31%   ±3.07%   ±4.00%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=1                   ***    192.86 %       ±5.61%   ±7.51%   ±9.86%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=10                  ***    772.37 %      ±34.02%  ±45.85%  ±60.87%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=100                 ***  13322.28 %     ±203.33% ±274.03% ±363.80%
async_hooks/async-local-storage-run.js n=10000000                                              ***    -81.11 %       ±0.54%   ±0.72%   ±0.96%

Be aware that when doing many comparisons the risk of a false-positive result increases.
In this case, there are 15 comparisons, you can thus expect the following amount of false-positive results:
  0.75 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.15 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)

Much better promise benchmark numbers. Some drops in other places though, which largely all appear to be due to native barrier transitions around getting or setting the context frame. I'm looking into the idea of making TurboFan builtins for the get and set functionality and using those as that should be a lot faster. 🤔

I'm actually not too worried about the benchmarks that drop though as most of those are measuring things that don't actually happen as often in real-world use, but I would still like to aim for improvements across the board though. Regardless though, that 130x performance boost on promise transitions is nice. 🤩

Qard avatar Aug 17 '23 20:08 Qard

I think a -94% would be pretty problematic. I'm not sure the benefits of faster promises would outweight this.

mcollina avatar Aug 17 '23 21:08 mcollina

I think the slowdown of getStore is understandable as the original implementation just calls JS code

I think that if we can measure the difference in ms (or us) it will make it clearer if the impact is really relevant.

H4ad avatar Aug 17 '23 21:08 H4ad

I'm looking into the idea of making TurboFan builtins for the get and set functionality and using those as that should be a lot faster. 🤔

Do these perform any allocations? we might be able to use fast api calls.

devsnek avatar Aug 17 '23 22:08 devsnek

I'm not sure if they fit within the GC rules of fast-calls. They conceptually should not be allocating, but I'm unclear if the assignment or extraction from the native context slot is allowed. My understanding of fast-calls from my last look was that they can't interact with any Local<...>, but perhaps that has changed or I misunderstood in the first place?

Qard avatar Aug 17 '23 23:08 Qard

I've added a commit which includes the addition of TurboFan builtins in V8. I'll have to get a change for that upstreamed before landing this, but just pushing to show the changes and to share how this has improved the benchmarks. Here's the latest numbers with this change:

                                                                                        confidence improvement accuracy (*)     (**)    (***)
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=10          ***     34.25 %       ±4.96%   ±6.63%   ±8.67%
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=100         ***     15.35 %       ±2.34%   ±3.12%   ±4.07%
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=1000         **      4.78 %       ±2.86%   ±3.82%   ±5.00%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=1                  ***     25.78 %       ±6.02%   ±8.08%  ±10.65%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=10                 ***     27.07 %       ±2.29%   ±3.04%   ±3.96%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=100                ***     11.66 %       ±2.10%   ±2.79%   ±3.64%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=0                        6.70 %       ±7.83%  ±10.49%  ±13.82%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=1               ***    139.27 %       ±6.61%   ±8.84%  ±11.62%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=10              ***    267.42 %      ±17.67%  ±23.75%  ±31.41%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=100             ***   1677.81 %      ±50.93%  ±68.60%  ±91.00%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=0                           -2.10 %       ±4.92%   ±6.60%   ±8.70%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=1                   ***    192.00 %       ±4.08%   ±5.49%   ±7.26%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=10                  ***    811.76 %      ±10.74%  ±14.48%  ±19.22%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=100                 ***  14474.30 %     ±236.50% ±318.73% ±423.15%
async_hooks/async-local-storage-run.js n=10000000                                              ***     45.79 %       ±1.20%   ±1.62%   ±2.14%

Be aware that when doing many comparisons the risk of a false-positive result increases.
In this case, there are 15 comparisons, you can thus expect the following amount of false-positive results:
  0.75 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.15 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)

This also makes most of the C++ code irrelevant so I will be deleting a bunch of that too. I'll do some more cleanup after I get the V8 change submitted. Just wanted to share where the benchmarks are at now. 😄

Qard avatar Aug 22 '23 02:08 Qard

There's one final remaining issue which is context leaking. The parallel/test-async-local-storage-contexts.js test is testing that each context gets its own separate AsyncLocalStorage graph even if the constructor or instance has been shared from another context. I'm not sure yet exactly how to solve this. 🤔

In the current model it will acquire the get/set instrinsics from the context that ran the initial code defining the AsyncContextFrame class. This value can then be transferred to another another context but will remain connected to the context where the class was defined. Anyone have ideas on how to deal with that? 🤔

Qard avatar Aug 22 '23 07:08 Qard

I'm not sure yet exactly how to solve this. 🤔

just a wild guess but do the tf builtins need to load the context from hsi like how GetEnteredOrMicrotaskContext and MicrotaskQueue work? you could reuse this code to read the current one from the vector https://source.chromium.org/chromium/chromium/src/+/main:v8/src/builtins/builtins-microtask-queue-gen.cc;l=384-412;drc=d708cc843536cc344f0f7679a978634cc14d6153;bpv=0;bpt=1

devsnek avatar Aug 22 '23 15:08 devsnek

I've now got the builtins working against active context rather than creation context. With that all existing tests are now passing. Final benchmark results are:

                                                                                        confidence improvement accuracy (*)     (**)    (***)
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=10          ***     37.01 %       ±5.19%   ±6.91%   ±9.00%
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=100         ***     19.49 %       ±2.97%   ±3.96%   ±5.15%
async_hooks/async-local-storage-getstore-nested-resources.js n=10000 resourceCount=1000        ***      7.55 %       ±3.28%   ±4.36%   ±5.68%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=1                  ***     25.97 %       ±2.66%   ±3.54%   ±4.62%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=10                 ***     28.12 %       ±2.37%   ±3.16%   ±4.13%
async_hooks/async-local-storage-getstore-nested-run.js n=10000 storageCount=100                ***     10.92 %       ±2.90%   ±3.86%   ±5.03%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=0               ***     11.72 %       ±2.49%   ±3.32%   ±4.33%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=1               ***    137.10 %       ±3.89%   ±5.20%   ±6.81%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=10              ***    263.46 %       ±5.99%   ±8.07%  ±10.70%
async_hooks/async-local-storage-propagate-asyncresource.js n=1000 storageCount=100             ***   1650.06 %      ±41.27%  ±55.62%  ±73.83%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=0                            0.16 %       ±2.10%   ±2.80%   ±3.65%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=1                   ***    195.33 %       ±3.87%   ±5.21%   ±6.90%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=10                  ***    807.12 %      ±11.39%  ±15.35%  ±20.38%
async_hooks/async-local-storage-propagate-promise.js n=100000 storageCount=100                 ***  14494.18 %     ±190.06% ±256.14% ±340.05%
async_hooks/async-local-storage-run.js n=10000000                                              ***     41.39 %       ±1.67%   ±2.22%   ±2.89%

Be aware that when doing many comparisons the risk of a false-positive result increases.
In this case, there are 15 comparisons, you can thus expect the following amount of false-positive results:
  0.75 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.15 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)

I'll get started on a CR to add the builtins to V8 and will update this again once we can get the changes for that and thenables support backported.

Qard avatar Aug 24 '23 11:08 Qard

I've submitted a V8 CR to add the TurboFan builtins for the get/set functionality. I'm also still waiting for reviews on the thenables fix. Feel free to review/comment on either. Once those both land I'll get them backported to Node.js and then update this PR. 🎉

Qard avatar Aug 24 '23 14:08 Qard