diagnostics icon indicating copy to clipboard operation
diagnostics copied to clipboard

Identify async_hooks use cases beyond AsyncLocalStorage

Open Qard opened this issue 4 years ago • 34 comments

It seems to be the general consensus of this working group that async_hooks should not be directly made stable due to it exposing internals. The AsyncLocalStorage API provides a higher-level solution to many of the use cases of async_hooks, however some use cases still remain. I would like to identify what those use cases are so we can introduce safer APIs solving those problems and eventually move toward deprecating direct use of async_hooks or perhaps just the unsafe aspects of it.

Known use cases:

  • long stack traces
    • Shouldn't make any use of the resource objects, so should be reasonably safe.
  • tracking handle lifecycle
    • The subsystem type, id, and event timings can be used to track high-level lifecycle safely
    • Deeper awareness may involve inspecting the resource object, which could be unsafe
  • measure time spent in blocking code
    • Mainly just need timing between before/after and maybe type for extra context, resource not required

What other use cases are there? I know clinic uses it for some things, perhaps @mcollina has some insight on that?

Qard avatar Oct 16 '20 17:10 Qard

https://github.com/naugtur/blocked-at - I used async_hooks to detect the event loop blocking function.

They can also be used to observe asynchronous flow execution, count number of async resources of certain type being created. https://github.com/naugtur/debugging-aid

None of these use cases really need access to the resource reference itself. The most valuable information is timing and stack trace form init callback in these cases.

naugtur avatar Oct 16 '20 21:10 naugtur

Added to the list. 👍

Needs we have so far:

Selectively capture stack trace at init and make available in descending async contexts

I believe this could be achieved through AsyncLocalStorage with the addition of an init trigger that could be used to capture the stack trace at that point and store it in the storage. Retrieving is just a matter of getting the value from the storage.

We could even have something like executionAsyncStackTrace() to do the same as what executionAsyncResource() does for resources, but for stack traces. Though, for performance reasons, we'd probably need to make it so it has to be turned on explicitly. 🤔

Record event timings

We can already capture general timings with perf_hooks.monitorEventLoopDelay(...). Could expand on that with resource type scoped histograms also, if individual timings for each run are not as important. Correlating with stack traces would be harder in that case though.

Another option is that this too could probably be layered on top of AsyncLocalStorage, storing an hrtime in an init trigger. It would need some additional system of computing and emitting the diff hrtimes in the other async_hooks events though, and I'm not sure exactly what that would look like or what the performance impact would be.

Do we need individual timings here, or could we reuse the Histogram interface from perf_hooks event loop monitoring but split per resource type and event type combination?

Qard avatar Oct 17 '20 00:10 Qard

A few of our clients uses cla-hooked to keep the request/response/user settings/logger and a lot of other critical data. The lose of context would be extremely problematic.

mcollina avatar Oct 17 '20 08:10 mcollina

Could we not just update cls-hooked to use AsyncLocalStorage internally? And probably even encourage users of it to just switch to using AsyncLocalStorage directly? They're functionally the same.

Also, to be clear: I'm definitely not proposing deprecating async_hooks right away. I'm just working on clarifying the path towards being able to do that someday as it's not great that we have this unsafe API exposing internals. I want to gradually migrate all our use cases for async_hooks to safer APIs so we can someday deprecate it.

Qard avatar Oct 17 '20 16:10 Qard

In Bubbleprof we use async_hooks to keep track of all the states of an async activity.

mcollina avatar Oct 17 '20 18:10 mcollina

"All the states" being the async_hooks event transitions, or do you do some deeper inspection of the resource objects? If you interact with the handles directly, why? Is it possible to encapsulate any handle access needs into a new API that doesn't expose the internal handle directly?

Qard avatar Oct 17 '20 18:10 Qard

One feature I miss in AsyncLocalStorage is to decide if context passing should happen or not based on the resource type. Technically it's correct to pass context via setInterval but in real life it's not wanted in each case. This doesn't require the resource itself. A slightly different approach would be to configure some timeout when a context should be no longer propagated.

The usecase where this is helpful are lazy initalizations of e.g. a logger, database connection,... within the first HTTP request as they stay assoziated with this request forever.

I have also an idea in mind to sum up CPU and wall clock time for an async tree. This doesn't require the resource but before/after hooks.

Flarna avatar Oct 17 '20 18:10 Flarna

That propagation decision is potentially something that could be achieved with the more limited init hook that'd also be needed by the stack trace capturing and event timing needs mentioned above, if we design it right. I see something emerging there.

Qard avatar Oct 17 '20 18:10 Qard

@Flarna the decision about context passing (or not) is made based in resource type, not reference to the resource itself, right?

naugtur avatar Oct 17 '20 18:10 naugtur

For timers the timout would be helpful. some people use setTimeout(0) as replacement for nextTick which is interesting to track. But timers for minutes are usually not of interest for a tracing usecase.

Flarna avatar Oct 17 '20 18:10 Flarna

"All the states" being the async_hooks event transitions, or do you do some deeper inspection of the resource objects? If you interact with the handles directly, why?

We just inspect the type of the handle.

mcollina avatar Oct 17 '20 19:10 mcollina

@mcollina @Qard I don't work for NearForm/clinic anymore. But one obscure usage is to detect http.Server instances and what address+port they are listening on. This is to automatically benchmark them, without the user specifying the port and address. You can look at https://github.com/mafintosh/on-net-listen.

I can understand replacing the resource object. As it is now, it is unsafe. However, I think there is value in attaching metadata to a resource. I don't get deprecating all of async_hooks, and providing higher-level APIs. I think it has always been the philosophy of node.js to provide low-level APIs to enable users to write their own modules. Deprecating async_hooks would conflict with that philosophy.

AndreasMadsen avatar Oct 17 '20 20:10 AndreasMadsen

The use case of detecting a net server is one of the use cases I see for diagnostics_channel, so I think we have a solution incoming for that.

As for the low-level APIs comment: I don't think this is a low-level versus high-level thing, this is a safe versus unsafe thing. Yes, there are many other low-level APIs, but they're generally all much safer and are not exposing undocumented internals. My goal here is to find the path forward to removing unsafety, whether that be through deprecating parts of the async_hooks API or deprecating the API as a whole. This is a sketchpad/brainstorming session to figure out what the possibilities are to achieve the results we want in a better way.

My personal perspective on async_hooks is that it conflates several unrelated things because one specific use case had multiple problems to solve--the context propagation need required barrier tracking and there are multiple forms of async barrier that needs to be propagated through. For mostly any other use case, the distinction between promises, timers, and libuv handles/requests is important and should probably not be presented in an aggregate form which then has to be filtered back to the form we actually want because that's just an unnecessary expense and complicates the internals. The problem with async_hooks in it's current form is actually not that it's too low-level, it's that it's an arbitrary mix of too much abstraction where none is wanted and not enough where some is needed.

My ultimate goal is to raise AsyncLocalStorage to a deeper level of integration, bypassing much of the unnecessary bits of async_hooks imposed by using the user-facing API. In addition, I want to create some more purpose-focused APIs to solve the other needs of async_hooks, which can be built with a more appropriate scope. By constructing more intentionally designed APIs that only operate on what needs to be operated on, many of these scenarios can achieve much better performance, rather than just having one global firehose of every async thing ever. There's a reason async_hooks performance is still bad in many scenarios: it does too much. 😬

Qard avatar Oct 17 '20 21:10 Qard

@Qard should we move this to a diag-deep-dive instead of regular agenda item?

mmarchini avatar Nov 25 '20 18:11 mmarchini

That could work. I tried to just go for async, but it seems to have not got a lot of engagement. A deep dive might help with that.

Qard avatar Nov 25 '20 20:11 Qard

@mcollina:

In Bubbleprof we use async_hooks to keep track of all the states of an async activity.

Well, to be clear, we use the trace events file that contains the lifecycle information. Bubbleprof does not use async_hooks directly. So long as we could extract the same information in some kind of log (doesn't even need to be the current trace events file) then bubbleprof should still be able to do its thing.

jasnell avatar Jan 08 '21 14:01 jasnell

Bubbleprof does not use async_hooks directly.

@jasnell FYI, bubbleprof collects stack-traces directly with async_hooks. https://github.com/clinicjs/node-clinic-bubbleprof/blob/main/injects/logger.js#L43

AndreasMadsen avatar Jan 08 '21 16:01 AndreasMadsen

Oh that's right forgot about that piece lol ... Been a while since I opened that particular bit of the code 😂 Still, it doesn't need async hooks as a stable api to do so. We could look at extracting the data other ways.

jasnell avatar Jan 08 '21 16:01 jasnell

Looking at that particular example, seems like the skip logic is basically already covered by AsyncLocalStorage and it'd just need an init hook to capture the stack trace. The idea of adding an init trigger of some sort to AsyncLocalStorage has already come up, so I think that could be a possible solution to that.

Qard avatar Jan 09 '21 18:01 Qard

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

github-actions[bot] avatar Apr 10 '21 00:04 github-actions[bot]

I remember reading that async_hooks might be the replacement for the deprecated Domains a long time ago. I've been out of the loop for a while, but for catching async errors at any point during a request so that proper cleanup can be done while still allowing other requests to finish, is that now covered by using async/await with try/catch? Is the functionality described in https://nodejs.org/api/domain.html now unnecessary? That would be a use case for which I would be interested in having a higher-level API.

philraj avatar Jul 22 '21 14:07 philraj

No, it's not a replacement for domains. It was a replacement for the old internals of domains. It's not likely domains will disappear any time soon, so I don't see any issue there. And hopefully domains can be made better with better designed replacements for async_hooks in the future. To be clear, I'm not advocating for removing async_hooks. I'm only suggesting making it an internal implementation detail of other APIs, or an abstraction over more purpose-built APIs, and discouraging use of async_hooks in the future.

Qard avatar Jul 24 '21 04:07 Qard

No, it's not a replacement for domains. It was a replacement for the old internals of domains. It's not likely domains will disappear any time soon, so I don't see any issue there. And hopefully domains can be made better with better designed replacements for async_hooks in the future. To be clear, I'm not advocating for removing async_hooks. I'm only suggesting making it an internal implementation detail of other APIs, or an abstraction over more purpose-built APIs, and discouraging use of async_hooks in the future.

That all sounds good to me. Don’t mind my ignorance, just wanted to check if that important use case was still being considered globally. Given what you said I’m confused about why domains have been listed as deprecated for 6 years or so, but that’s not really relevant to this discussion.

philraj avatar Jul 24 '21 04:07 philraj

Domains have been listed as deprecated for a long time because the internals pre-async_hooks were quite unstable and had a lot of problems, then there was little to no maintenance of it for a long time. It has improved a bunch in recent years, but it's still a very complicated system for which it is very hard to verify 100% correctness. It still has many issues that make it possible for code to escape the sandbox, so I wouldn't consider it a very trustworthy way to sandbox code. It doesn't isolate especially well. You're probably better off just putting stuff in a worker thread if you want proper isolation.

That being said, many uses of it remain in old npm modules that continue to be popular, so it's not likely to go away any time soon. The recently introduced "legacy" status is likely to be a better label for it, and I expect it will get moved to that soon.

Qard avatar Jul 24 '21 15:07 Qard

to stay within the bounds of your top-level summary, long stack traces will always be a great use case. yet the question would be; what sort of performance hit are you willing to take in order to get that value

having a clean impl as close to the 'native' Node binary as possible would minimize that overhead

that'd be my 2nd wish. my 1st had always been for CLS / a ThreadLocal equivalent ... and AsyncLocalStorage is that very thing, so 🙏

cantremember avatar Aug 06 '21 03:08 cantremember

We are building a high-perf large ML project in nodejs using tensorflow.js and a lot of custom computation modules. While majority of calculations are offloaded to CUDA, we still run a lot of multithreaded computations in node/typescript and so far it's been very robust, except for async_hooks. The whole app that we've got is asynchronous and this constant unnecessary dribbling into internal/async_hooks.js and internal/inspector_async_hook.js is very unfortunate and CPU-wasting. We would appreciate if there was a possibility to switch off async_hooks entirely (e.g. as a command line argument) for high-perf ML/computation apps. We explored switching to C++/rust, and we found typescript to be far much easier to deal with, especially in interactive settings, and it has the built-in eval() that we rely on heavily.

WillAvudim avatar Aug 07 '21 11:08 WillAvudim

It's turned off by default. You should only see it if something in your app is using it.

Qard avatar Aug 08 '21 23:08 Qard

Sequelize (ORM) uses CLS-hooked for managed transactions. They had some had time with CLS + async in the past, but it looks like they could not move from it for some reason, IDK (I'm a consumer).

There's a good explanation in the cls-hooked package adventures, that may be valuable: https://www.npmjs.com/package/cls-hooked#a-little-history-of-asyncwrapasync_hooks-and-its-incarnations

leodutra avatar Aug 20 '21 19:08 leodutra

@papb I wonder if this is of interest.

leodutra avatar Aug 20 '21 19:08 leodutra

It should be straightforward to switch from cls-hooked to AsyncLocalStorage.

Qard avatar Aug 20 '21 19:08 Qard