benchmarking Performance impact of async

I've already started a related discussions on Twitter earlier here, but Twitter doesn't scale for this kind of discussion, so I thought I should move it here, so it's easier to follow the discussion. For background: @jasnell approached me at NodeConfEU in November 2017, asking for help on the performance of async_hooks on the V8 side, especially when it comes to the Promise lifecycle hooks. And I've talked briefly with @thlorenz about it during the conference (although I have to admit that I was using the wrong terminology and thus kinda ruined the conversation, sorry). And we had chatted briefly about this as well with @trevnorris during one of the CTC-V8 calls.

Since Promise based APIs are likely becoming a (big) thing for Node 10 and at the same time there are estimates (i.e. by @ofrobots) that by the end of next year every Node server in enterprise is going to run with async_hooks, we should start a discussion about this early one to stay ahead of the problem before it becomes a real problem for our users.

In the last year @gsathya and @caitp already spent a lot of effort optimizing promises in V8 in general, so we're in a pretty good position wrt. baseline performance!

I'd like to use this forum to come to an agreement on what the concrete use cases are (at least estimate that), which aspects of the performance matter the most (i.e. promises created inside of C++ vs. in JS land), and what are useful benchmarks to drive the performance work in a meaningful way. I think we need both a set of micro-benchmarks to drive certain, in addition to real-world code that makes heavy use of promises, i.e. koa.js or fastify servers maybe? I feel that these benchmarks are most important, otherwise we might easily sink a lot of effort into work that doesn't help real-world use cases in the end. That's why I opened the issue on the benchmarking WG.

Comments and contributions are very welcome!

Dec 16 '17 20:12 bmeurer

There's been a related discussion in https://github.com/nodejs/promises/issues/31 earlier this year, which yielded a couple of interesting improvements and insights.

Pinging @nodejs/performance @nodejs/async_hooks

Dec 16 '17 20:12 bmeurer

I'd like to use this forum to come to an agreement on what the concrete use cases are (at least estimate that), which aspects of the performance matter the most (i.e. promises created inside of C++ vs. in JS land), and what are useful benchmarks to drive the performance work in a meaningful way.

Particularly about async_hooks and promises or promises in general?

For async_hooks and promises - I think a server-application benchmark could be good here - with contexts through async functions through several layers of calls - I'd then listen to promiseResolve and write interesting things about the app while sending it "user data" and measure how long it takes. "interesting things" can be things like how many times the database is hit, request latency for different request stages etc.

For promises in general there are at least 2-3 good ideas in that thread that can get native promises finally faster than userland alternatives like bluebird. Namely skilling the microtask queue when possible and inlining calls - not to mention async iterators that aren't fast yet but could revolutionize programming in Node.js if they were.

Dec 16 '17 21:12 benjamingr

I'd like to use this forum to come to an agreement on what the concrete use cases are (at least estimate that), which aspects of the performance matter the most (i.e. promises created inside of C++ vs. in JS land), and what are useful benchmarks to drive the performance work in a meaningful way.

Particularly about async_hooks and promises or promises in general?

Improving promise performance in general is also a good idea, but this particular issue is primarily about the impact of async_hooks, i.e. about the performance cliff that you fall off when you enable async_hooks.

As for promises in general and in particular async/await and async iterators, we will need good benchmarks as well to drive meaningful performance work. I've started to collect some ideas for promise performance improvements, but I don't feel like it's a good approach to measure our success in terms of the bluebird benchmarks.

I think @mcollina had some comments about that before. Can we derive a simple representative performance test from fastify to get things going?

Dec 17 '17 17:12 bmeurer

Specifically, regarding PromiseHooks and performance I would like to see:

kDestroy is added. I think in many cases it is straightforward to know when a promise can no longer be refered too. Thus we could emit the destroy hook much sooner, in other cases it would be fine to wait for garbage collection, but if V8 could instrument that for us directly it would be great.

I've gotten many real-life complains from both APM providers and other async_hooks users that they experience a "memory leak". The reality is, that there is no memory leak but that it unintuitive that the destroy event is not emitted as soon as possible but rather at garbage collection.

I'm guessing this would improve performance a lot, as we would no longer have to wrap the promise object itself to get notified about garbage collection. Essentially we could just create an unreferenced basic object on kInit and set then just set the asyncId and triggerAsyncId (two doubles) on the internal fields.

Extra feature requests that are not performance related.

Know the resolve value in kResolve. https://github.com/angular/zone.js/pull/795 currently needs this.
Don't break executionAsyncId() and triggerAsyncId() when async_hooks is not enabled. This might be very theoretical. It would involve enabling PromiseHooks unconditionally. If we also had kDestroy we would only need to set two doubles on the internal fields.

/cc @addaleax

Dec 17 '17 20:12 AndreasMadsen

@AndreasMadsen Thanks for the feedback, that's very interesting. These seem to be more on the feature request side.

I think in many cases it is straightforward to know when a promise can no longer be refered too.

Do you have a suggestion/intuition here how to do that?

I'm guessing this would improve performance a lot, as we would no longer have to wrap the promise object itself to get notified about garbage collection. Essentially we could just create an unreferenced basic object on kInit and set then just set the asyncId and triggerAsyncId (two doubles) on the internal fields.

So you're saying that by changing the API and adding machinery to implement kDestroy on the V8 side we should be able to improve performance significantly?

Dec 18 '17 06:12 bmeurer

cc @gsathya @caitp

Dec 18 '17 06:12 bmeurer

Do you have a suggestion/intuition here how to do that?

Often promises are only "referenced" once, either by .then() chaining or they are returned and used in await. At least for the user, it is quite obvious that the promise can't be referenced after that.

function wait(ret, ms) {
  // hint: the promise is directly returned
  return new Promise(function (resolve, reject) {
    setTimeout(() => resolve(ret), ms);
  });
}

function main() {
 // hint: the promise is directly used in `await` without being assigned to a variable
 await wait(1, 10) // after this line the user expects the `destroy` hook to emit
 await wait(2, 10) // same for this promise
 await wait(3, 10) // same for this promise
}

So you're saying that by changing the API and adding machinery to implement kDestroy on the V8 side we should be able to improve performance significantly?

Yes, that would be my guess. Wrapping and unwrapping the promise object is definitely the most expensive part of our PromiseHooks integration. Not having to do that, should improve performance.

Dec 18 '17 07:12 AndreasMadsen

cc @jaro-sevcik

Dec 18 '17 07:12 bmeurer

Often promises are only "referenced" once, either by .then() chaining or they are returned and used in await. At least for the user, it is quite obvious that the promise can't be referenced after that.

If V8 could detect that with escape analysis and optimize the return value of that function to an entirely different implementation that would be awesome. Fast libraries all hack this by having the single listener case optimized - but an engine that can prove it and avoid allocating extra stuff would be awesome. In terms of promise hooks I suspect it would also simplify things a lot for the common case.

Dec 18 '17 08:12 benjamingr

On the benchmarks side, we can also consider Hapi v17, which is completely async-await based.

The problem with real-world code is that it typically involves a database, and that makes it very hard to measure small improvements. IMHO we should measure:

receive a request, performs a query in the DB, returns value as JSON
receive a request, performs a query in the DB, performs another query in the DB, returns a JSON
receive a request, performs a query in the DB, send an HTTP request somewhere (maybe using fetch), returns a JSON

I fear this would have to be a synthetic application.

Dec 18 '17 08:12 mcollina

I fear this would have to be a synthetic application.

This is what Doxbee did (the bluebird benchmark suite) which you're familiar with - we can fork and extend it.

Dec 18 '17 08:12 benjamingr

@benjamingr I think we are all more concern on overall impact of async await rather than a comparative measurement between implementations. Plus, I think we should focus on overall performance including a web framework, a database and HTTP requests rather than just the promise/async await layer.

Dec 18 '17 09:12 mcollina

Thanks @AndreasMadsen, @benjamingr and @mcollina! This is definitely interesting.

Yes, that would be my guess. Wrapping and unwrapping the promise object is definitely the most expensive part of our PromiseHooks integration. Not having to do that, should improve performance.

This seems surprising to me. I would think that calling out to the javascript hook defined by the user from C++ to be the most expensive part. Have you tried benchmarking with and without the wrapping?

If V8 could detect that with escape analysis and optimize the return value of that function to an entirely different implementation that would be awesome.

The way escape analysis works is by figuring out if a given allocation is unobservable. But, we lose all that with PromiseHooks as the kInit will cause the promise to escape. PromiseHooks enable more observable behavior, potentially restricting optimization.

Plus, I think we should focus on overall performance including a web framework, a database and HTTP requests rather than just the promise/async await layer.

@mcollina I agree, 💯. The hard part would be to try and get reproducible results from such a benchmark though.

@AndreasMadsen -- Can you link to a real world usage of async hooks? How are developers using async hooks? Is it for async stack traces? Is the kResolve hook useful? How is the kDestroy hook used? Is it useful for context propagation? PromiseHooks have a big surface area -- they give you the promise and let you do whatever. If we can find patterns in real world and benchmark these patterns, then we can start to optimize for that and go from there.

Dec 18 '17 11:12 gsathya

@mcollina My gut feeling is that we will need some kind of micro-benchmarks to drive individual work items. Plus real-world code to see overall impact and find appropriate areas worth looking into.

@benjamingr Escape analysis could indeed eliminate promises (sometimes), but with promise hooks that becomes impossible, since every promise already escapes/leaks to the hooks.

Dec 18 '17 12:12 bmeurer

This seems surprising to me. I would think that calling out to the javascript hook defined by the user from C++ to be the most expensive part. Have you tried benchmarking with and without the wrapping?

I admit, I have not done a full matrix comparison. But this was the observation when we initially added the wrapping. It also depends on what hooks are used. I remember that we have some numbers for this but I can't find them.

The way escape analysis works is by figuring out if a given allocation is unobservable. But, we lose all that with PromiseHooks as the kInit will cause the promise to escape.

I can't think of any that actually use the promise object. If we had the kDestroy hook we could properly live without it. We would also need kInit to allow us to return a custom id (the asyncId) and if PromiseHook would carry that context information for the other events. The parentPromise would then have to be replaced with that id.

Can you link to a real world usage of async hooks?

We use it in node itself:
- https://github.com/nodejs/node/blob/master/lib/internal/trace_events_async_hooks.js
- https://github.com/nodejs/node/blob/master/lib/internal/inspector_async_hook.js
- https://github.com/nodejs/node/blob/master/lib/domain.js
APM GoogleCloudPlatform: https://github.com/GoogleCloudPlatform/cloud-trace-nodejs/pull/538
APM elastic: https://github.com/elastic/apm-agent-nodejs/pull/77
long stack traces: https://github.com/AndreasMadsen/trace
Continues local storage: https://www.npmjs.com/package/cls-hooked

Is it for async stack traces?

That and continues local storage. Hopefully, we will see it being used as a profiling tool too.

Is the kResolve hook useful?

kResolve is used by https://github.com/angular/zone.js/pull/795 - but it is not enough for what they want. Either they need the resolved value or an extra event.

How is the kDestroy hook used?

It is not implemented by PromiseHooks so it isn't used. But it would directly replace our destroy hook. The destroy hook is used to clean up information that is stored in a new Map with the asyncId as the index value.

issue about promises not being destroyed: https://github.com/nodejs/node/issues/14446

Is it useful for context propagation?

Very.

Dec 18 '17 12:12 AndreasMadsen

Is the kResolve hook useful?

kResolve is used by https://github.com/angular/zone.js/pull/795 - but it is not enough for what they want. Either they need the resolved value or an extra event.

I've heard that zone.js might be abandoned by the Angular folks. And async_hooks are available on the Node side only, so they still need a different solution for client side. Not sure who to ask about this?

Dec 18 '17 12:12 bmeurer

@mcollina

I think we are all more concern on overall impact of async await rather than a comparative measurement between implementations. Plus, I think we should focus on overall performance including a web framework, a database and HTTP requests rather than just the promise/async await layer.

I think what we're measuring async/await against here (with async_hooks) is callbacks. I agree with the database assumption, I'd focus on the most common stack and write the same app in two ways (callbacks and promises) and add instrumentation (for async hooks).

I tend to agree with database/web framework and http requests - but we need to make sure those are as thin as possible so we don't end up with an implementation that is fine "for the average case" but completely breaks for apps that have a harder performance requirement.

What about a benchmark that does:

Make HTTP requests, but make the requests on a unix socket rather than a tcp one to minimize the affect of the network stack. I'm not sure it's not better to swap out the HTTP implementation for the test altogether (and just give koa/express/whatever a "mock HTTP interface") but that's risky.
Make DB requests, but very low overhead requests, preferably to the same machine and over a unix socket - this part feels very unreliable - I'm not sure how we can be certain our measurements are significant. Probably a Node server running an in-memory database on the other end (maybe the same one that's making the http requests). Measuring two processes is harder though and more prone to OS scheduling quirks.
I'm not sure I'd serialize to JSON so easily - since that might end up dominating the time the app takes and a lot of more performance conscious apps don't do it very often.

In general, it's mostly a scope question of whether Node.js should optimize for "the average app" or for "performance sensitive apps".

@bmeurer

Escape analysis could indeed eliminate promises (sometimes), but with promise hooks that becomes impossible, since every promise already escapes/leaks to the hooks.

I'm not sure I follow - so I'll clarify my request/suggestion. Escape analysis can (probably) greatly optimize most of the modern usage of promises in async/await where we're awaiting a direct invocation of an async function - and use much simpler promise code.

async function bar() {

}

(async function foo() {
  // bar is also an async function, this call only has one listener and doesn't escape
  // this is, presumably "the common case".
  await bar(); 
})();

In this case, we don't actually need to allocate a real promise object - we can continue the execution of foo when bar ends. Properties of promises like multicast and assimilation aren't "required". We don't need await to call Promise.resolve on its parameter since we know it's an async function or anything like that either.

If we can prove other nice properties for bar (like that it doesn't throw) it gets even better and we can inline bar entirely in foo (and 'defer a microtick' since we're awaiting it). This would let us drop any allocation or book-keeping altogether.

I realize this is far from the actual implementation but the benefit is also huge and it doesn't sound "too crazy" (I hope :)).

For my specific request: when/if this is optimized, you can inline async_hooks calls too in the above code. Instead of destroy when a promise allocated by bar gets GCd, there is no promise created (or even a call to bar whose code is inlined in foo) and destroy is emitted synthetically.

Dec 18 '17 14:12 benjamingr

@benjamingr The thing with promise hooks (as required by async_hooks) is that we need to pass the JSPromise object to Node, which immediately makes all escape analysis useless as TurboFan has no idea what Node does with the object and cannot perform scalar replacement of the fields.

Dec 19 '17 20:12 bmeurer

The thing with promise hooks (as required by async_hooks) is that we need to pass the JSPromise object to Node, which immediately makes all escape analysis useless as TurboFan has no idea what Node does with the object and cannot perform scalar replacement of the fields.

@bmeurer As said, we don't actually need the JSPromise object in node, we just need a way to pass context.

Dec 19 '17 20:12 AndreasMadsen

That's not how it's implemented currently. I think I need to look at the Node core side again. Don't you need to slap ids onto the JSPromise object?

Dec 19 '17 20:12 bmeurer

@bmeurer yes we do. If PromiseHooks allowed us to set those ids (just one 64bit id really) in kInit and then passed them to the other events we wouldn't need the JSPromise object.

Our PromiseHook code is here: https://github.com/nodejs/node/blob/master/src/async_wrap.cc#L283

Dec 19 '17 20:12 AndreasMadsen

@AndreasMadsen But we still need to materialize the JSPromise object to pass it to the kInit hook. So you already defeat the escape analysis at that point. Would it be possible to eliminate the need to pass a concrete JSPromise object to the promise hooks completely?

Dec 21 '17 19:12 bmeurer

Would it be possible to eliminate the need to pass a concrete JSPromise object to the promise hooks completely?

This is what I'm saying! We are only using it propagating context, so if PromiseHooks provided a different way of doing this, such as propagating a 64bit id set in kInit then that would work just fine.

Dec 21 '17 22:12 AndreasMadsen

@AndreasMadsen Staring at the PromiseHook on the Node side, you store the PromiseWrap object on the JSPromise and you do check the parent JSPromise during kInit. So this would be a major refactoring on how this works. A few questions come to mind here:

Can we get rid of the PromiseWrap for every JSPromise?
Can we move the id slapping logic to V8 instead? So that V8 can set these internal fields upon creating of JSPromise objects.

@gsathya does that make sense to you as well? Do we have other users of the PromiseHook?

Dec 22 '17 08:12 bmeurer

Staring at the PromiseHook on the Node side, you store the PromiseWrap object on the JSPromise and you do check the parent JSPromise during kInit.

Correct. The PromiseWrap holds the ids (asyncId and triggerAsyncId) and help us emits the destroy event. We also send it to the user as the resource object, where it contains two values:

resource = {
  promise: JSPromise,
  // lookup parent asyncId on the internal field of the parent PromiseWrap
  parentId: parentJSPromise.[[PromiseWrap]].[[asyncId]]
}

So this would be a major refactoring on how this works.

On our side, it would be API breaking but I don't see a big issue here. What would be missing is the resource.promise object but I have never seen anyone actually use that. And I don't think it is worth the performance overhead.

Can we get rid of the PromiseWrap for every JSPromise?

yes. It would require two things:

V8 does the id slapping logic for us (at least one id).
V8 emits a kDestroy event.

These are the only critical things we use the PromiseWrap object for and hence the JSPromise as well.

Can we move the id slapping logic to V8 instead? So that V8 can set these internal fields upon creating of JSPromise objects.

yes

We would need at least one id (asyncId). We have another id (triggerAsyncId) but we could store that in a map as the asyncId is unique for each promise.

Dec 22 '17 09:12 AndreasMadsen

@bmeurer,

I've heard that zone.js might be abandoned by the Angular folks. And async_hooks are available on the Node side only, so they still need a different solution for client side. Not sure who to ask about this?

Angular current provide a NgNoopZone to improve performance, it will not load zone.js,but it is just an additional option when user want to fully control when to do change detection.

currently in zone.js, I want to know when promise is really resolved instead of get a kResolve callback because I want to

implement a asynchooks based zone.js.
let zone.js be able to use native async/await.

I am still looking for walk around, but it seems to be a v8 feature request. I also post a feature request there , https://bugs.chromium.org/p/v8/issues/detail?id=6862

Dec 22 '17 10:12 JiaLiPassion

@AndreasMadsen @bmeurer

So this would be a major refactoring on how this works.

On our side, it would be API breaking but I don't see a big issue here. What would be missing is the resource.promise object but I have never seen anyone actually use that. And I don't think it is worth the performance overhead.

Note that async-hooks are still experimental, so the cost of making a semver breaking change is not that high, specially if it has significant benefits. Regardless, if V8 was to implement this today, this would only get into Node 10, a semver major release. Whether or not the change can be backported to Node 8 and 9 would depend on the complexity rather than semver-ness of the impact to async-hooks API.

Dec 22 '17 16:12 ofrobots

@bmeurer I've tried to implement a user request context using Async Hooks. I wrote about it a blog post. We used it for debugging purposes as well when we had problems to understand where exactly our bug is.

Dec 26 '17 10:12 guyguyon

@guyguyon can you share information about the performance impact and what parts were painful in that experience?

Dec 26 '17 10:12 benjamingr

I know the idea is to share production experience but I didnt dare to upload it to production yet :/. From our Dev experience, I'm deleting a request context using the destroy hook. Since on some resources it will be called by the garbage collector, it caused the request context object to grow fast.

Dec 26 '17 10:12 guyguyon

benchmarking
benchmarking copied to clipboard

Performance impact of async_hooks

benchmarking benchmarking copied to clipboard

Performance impact of async_hooks

benchmarking
benchmarking copied to clipboard