proposal-async-context icon indicating copy to clipboard operation
proposal-async-context copied to clipboard

Context termination

Open jridgewell opened this issue 2 years ago • 38 comments

Opening a placeholder issue on "Task termination", which Chrome has brought up as a necessary part of their Task Attribution API. They want to know when a particular task execution has finished, so that they can record the total time spent (eg, how long after a click does it take to paint the next page?). Someone has also mentioned that this is a usecase that Angular needed in their Zone.js implementation.

@littledan said he was thinking on an initial sketch on how this could be exposed in JS.

jridgewell avatar May 02 '23 19:05 jridgewell

Since a task context can be captured by a wrapped function, it sounds like the termination of that task is intrinsically tied to the garbage collection of all functions that are tied to the context. We cannot introduce a mechanism that would allow to more precisely observe the garbage collection behavior than what is already allowed through WeakRef / FinalizationRegistry.

Given the above, would it be acceptable to simply require that users interested in task termination use an object as the .run() value, and register that object with a FinalizationRegistry to be notified of that run's completion?

mhofman avatar May 02 '23 21:05 mhofman

That was my initial suggestion, but apparently Chrome needs a much tighter bound on the time after the task terminates, so that timestamps can be recorded.

What's been described isn't actually tied to the lifetime of a userland value, but to the global context mapping itself. The value placed into inside a run may or may not have been GC'd yet. What Chrome needs is a signal that the global context mapping (which is not userland observable) which held the value is no longer active and is not held by any snapshots.

I imagine if user code does takes a manual snapshot, then task termination is controlled by GC of that snapshot (and would use the much slower FinalizationRegistry times). I think that might be ok? But if only promises and async/await do the snapshotting, then we can use a fast path once all pending execution has completed.

CC @yoavweiss, what would be the behavior if user code manually snapshotted the global context? Would it be ok for that task to never terminate if the snapshot is never freed?

jridgewell avatar May 02 '23 22:05 jridgewell

I think it's acceptable for an internal host implementation to use a faster path, but what ends up exposed to the JS program must not be observably more accurate. If the task termination is recording timings accurately, the host should probably not reveal that timing information to the program until the program itself can observe the task termination (aka the host should not create an alternative path of observing the finalization of JS objects).

mhofman avatar May 02 '23 22:05 mhofman

taking a snapshot of a context is not bound to a specific operation/task. It's not the task which is captured its a context. AsyncLocal propagates the context across several operations. Whenever a new task (e.g. promise,...) is created it picks up the context and propagates it across the new async operation.

in node.js the best fit API to monitor individual tasks would be the init and destroy hook of async_hooks. These details are intentionally not exposed by AsyncLocalStore. To my understanding the proposal here is more to provide the functionality similar to AsyncLocalStore and not the low level async hooks.

Flarna avatar May 03 '23 06:05 Flarna

taking a snapshot of a context is not bound to a specific operation/task. It's not the task which is captured its a context.

I disagree. Capturing a snapshot of the current context should extend the current task, because I can reenter that task at a later time. While doing the endojs security review, we actually discovered that snapshotting is effectively the same as capturing a pending promise, it just exposes capability of creating a new subtask in a sync way.

in node.js the best fit API to monitor individual tasks would be the init and destroy hook of async_hooks. These details are intentionally not exposed by AsyncLocalStore.

Yah, this is effectively the destroy hook. I'm not certain how it would be exposed yet (or if it would just be an embedder API and not userland).

jridgewell avatar May 03 '23 18:05 jridgewell

But one can also "capture" the active context by calling get(), store it in some local variable and activate it later. This is not that uncommon in case you have to manually propagate a context and there is already some object representing the "thing" to track.

The destroy hook operates per single operation, not per activated context at the end of the propagation chain.

Flarna avatar May 03 '23 19:05 Flarna

But one can also "capture" the active context by calling get(), store it in some local variable and activate it later

You can get a value, not the overall context mapping. You can keep an individual value alive, but that doesn't mean the task is still computing more work.

A snapshot captures the full context so it can be restored later, which I think should imply it's an extension of the current task. Eg, if I needed to postMessage to a web worker, and stored an AsyncContext.wrap(cb) to be invoked when the worker responds, I think that should count as a continuation. The same if I instead do new Promise((res) => { window.onMessage = res; }).then(cb) (I think we're both in agreement that this is the same task).

jridgewell avatar May 03 '23 21:05 jridgewell

What is the "overall context mapping"?

Maybe I have the wrong understanding of the terms used.

context.run(span, cb) sets span as active and span is propagated into any async operation (task?) created within cb. context.get() can be used to read span.

What is the observable difference between capturing span in a closure via context.wrap and activate it by invoking this closure compared to manually keeping a reference to span and later call context.run(span, otherCb) again?

Or is context.wrap() intended to capture the state of all AsyncContext instances not only from my owned instance?

some examples to illustrate the difference:

opentelemetry node.js context manager (uses AsyncLocalStore internally) offers bind which captures a context in a closure which activates it later by calling with (same as run here).

On the other hand the node.js built in AsyncResource.bind creates a new AsyncResource which causes that all active stores from all AsyncContext instances are propagated/captured by this new resource.

So what is "the task" here? And which kind of propagation should be seen as task extension? Or do we have one task per AsyncContext instance and variant A extends it for one instance and B for all of them?

Flarna avatar May 03 '23 21:05 Flarna

What is the "overall context mapping"?

The mapping of all AsyncContext values (the same as the __storage__ variable in the slides and old sample impl)

const a = new AsyncContext();
const b = new AsyncContext();

// context: {}
a.run(1, () => {
  // ctx: { a: 1 }

  b.run(2, () => {
    //ctx: { a: 1, b: 2 }
  });

  // ctx: { a: 1 }
});
// ctx: {}

What is the observable difference between capturing span in a closure via context.wrap and activate it by invoking this closure compared to manually keeping a reference to span and later call context.run(span, otherCb) again?

The first continues the same context (the mapping is referentially equal, not just deeply equal), where as the second creates a new sub-context under whatever is active at the time of the run.

Or is context.wrap() intended to capture the state of all AsyncContext instances not only from my owned instance?

I think this is the confusion. It's AsyncContext.wrap(), not context.wrap() (a static method on AsyncContext, not a prototype method on instances). Wrap captures every value currently present in the mapping to be restored later, the same way p.then() captures every value in the mapping to be restored when the promise settles.

The only way to get a single value is via .get(), which doesn't continue the current task.

opentelemetry node.js context manager (uses AsyncLocalStore internally) offers bind which captures a context in a closure which activates it later by calling with (same as run here).

On the other hand the node.js built in AsyncResource.bind creates a new AsyncResource which causes that all active stores from all AsyncContext instances are propagated/captured by this new resource.

AsyncContext.wrap() is the same as AsyncResource. We're likely to change the API to a new class, which might help with this confusion.

So what is "the task" here? And which kind of propagation should be seen as task extension?

In words, it's an execution started by any ctx.run() (could be the root or a child). Calling a.run(1, () => { b.run(2, () => {}) }) creates 2 tasks, with the inner being a child of the outer. If a promise is pending in the outer task, then the outer task is still pending. If a promise is pending in the inner task, then the inner and outer are pending.

In psuedocode, a task is a instance of the __storage__ map from the old sample code (note that calling ctx.run(val) creates a new copy of the mapping to insert the ctx: val entry). As long as the Map instance is alive, the task is pending. When the map is garbage collected, the task is terminated. Both promises and the closure returned by AsyncContext.wrap() hold references to the map (whichever is current at that time), so they keep the map alive.

jridgewell avatar May 03 '23 22:05 jridgewell

I realize now that the old code doesn't model parent-child relationships explicitly, so a parent task will terminate before child tasks do. https://gist.github.com/jridgewell/a3a4144b2b7cec41bc9da255d3b3945a models termination lifetimes correctly.

jridgewell avatar May 03 '23 22:05 jridgewell

Thanks for the clarifications. Really helpful.

So a task is some global state which is implict created by an individual instance of AsyncContext. While I fully understand that internally a single __storage__ (or task) exists I wonder if it should be visible to users.

Adding one more AsyncContext instance (e.g. load OTel, choose some framework which uses AsyncContext internally,...) will result in change task creation/lifetime for all users.

Usually I would expect that individual AsyncContext instances are isolated. In special observability tools like OTel usually want to avoid any visible side effect.

I would also expect two variants of snapshot():

  1. a static variant to be used by e.g. queuing frameworks to implement them "async context propagation aware" (they don't use/have an instance of AsyncContext by themself)
  2. a member variant to allow individual AsyncContext users to tune the propagation for their needs (like that one implemented by OTel)

Well, the second one can be implemented in userland therefore not required to be included in AsyncContext.

Flarna avatar May 04 '23 05:05 Flarna

I don't think GC is an option here. Apologies on my delay, I plan to work on a sketch here soon as @jridgewell said.

littledan avatar May 04 '23 14:05 littledan

Looking forward to the sketch, but based on how GC already plays a role in this, I don't see how a user land API for task termination could be anything else than related to FinalizationRegistry.

mhofman avatar May 04 '23 14:05 mhofman

Adding one more AsyncContext instance (e.g. load OTel, choose some framework which uses AsyncContext internally,...) will result in change task creation/lifetime for all users.

Not quite. The only way to extend the lifetime of task is to keep a pending promise around (or snapshot). Whether that promise is created as someAsyncStuff() you wrap it in your own ctx.run(1, () => someAsyncStuff()) doesn't change anything.

By calling into your code, I've made your code part of my task execution. Until your code completes, my task isn't complete. It doesn't matter if your code has its own AsyncContext instance internally.

2. a member variant to allow individual AsyncContext users to tune the propagation for their needs (like that one implemented by OTel)

I tried looking through the OTel bind code. It seems like it's just:

class AsyncContext {
  bind(val, cb) {
    return this.run.bind(this, val, cb);
  }
}

That seems fine, but I don't think this would change task termination.

jridgewell avatar May 04 '23 21:05 jridgewell

if the parent/child relationship is kept and the internal context object is the representation of the task it should work.

Not sure how useful it is as a simple setInterval for a background activity would keep a task alive forever but well that's actually correct behavior. There seems to be no way to stop task prolongation because even run(undefined, cb) creates a new child task, just one entry in the map is set to undefined.

~~I'm also not sure if it would be problematic if a task is a child of itself (maybe with other tasks in-between) but I guess this can happen if snapshot is used and implicit propagation happens at the same time.~~ Edit: actually this can't happen. use of snapshot is switching context not creating a new child task.

Note also that this form of task termination is fundamental different to the current destroy hook in node.js. That hook operates on individual async resources (individual promise, nextTick) and any sort of prolongation/propagation/... is up to users. Note that I'm not proposing here to use the node.js destroy hook.

Flarna avatar May 05 '23 07:05 Flarna

Given that if an AsyncContext.Snapshot that hasn't been GCed could be used, surely the only way earlier detection would be possible is if there is some .discard or such on Snapshot right? (This would be called automatically if the snapshot is GCed of course).

e.g We could have a run function that tracks termination like:

let snapshot;

variable.run(
    "SOME_VALUE",
    () => {
        snapshot = new AsyncSnapshot();
    },
    // Optional callback that is called when the variable is no longer reachable in this state
    () => {
        console.log("Task terminated");
    },
);

setTimeout(() => {
    // At this point the ref count of AsyncContext.Snapshots that refer to the active
    // mapping is zero so we can call the termination callback now
    snapshot.discard();
}, 1000);

Effectively the stack when executing this would be initially:

{ 
   [[AsyncContextMapping]]: { 
       variable => { value: undefined, snapshots: [] } 
   }
}

when variable.run is executed we transition to:

{ 
   [[AsyncContextMapping]]: { 
       variable => { 
           value: "SOME_VALUE",
           terminationHandler: theTerminationHandler,
           snapshots: [
               // This is the snapshot that variable.run creates
               ANONYMOUS_SNAPSHOT_CREATED_BY_RUN,
           ],
       } 
   }
}

then when snapshot = new AsyncSnapshot() is executed we have:

{ 
   [[AsyncContextMapping]]: { 
       variable => { 
           value: "SOME_VALUE",
           terminationHandler: theTerminationHandler,
           snapshots: [
               // This is the snapshot that variable.run creates
               ANONYMOUS_SNAPSHOT_CREATED_BY_RUN,
               // This is the snapshot created by the user
               snapshot,
           ],
       } 
   }
}

now .run has finished executing the function so the anonymous snapshot can be removed:

{ 
   [[AsyncContextMapping]]: { 
       variable => { 
           value: "SOME_VALUE",
           terminationHandler: theTerminationHandler,
           snapshots: [
               // This is the snapshot created by the user
               snapshot,
           ],
       } 
   }
}

then sometime later when snapshot.discard() is called we remove it from the list:

{ 
   [[AsyncContextMapping]]: { 
       variable => { 
           value: "SOME_VALUE",
           terminationHandler: theTerminationHandler,
           snapshots: [],
       } 
   }
}

and because snapshots is empty, there are no longer any referencing snapshots and so we can call the termination handler.

Of course we could also make this more granular and provide a full observer for when snapshots are created and destroyed:

variable.run("SOME_VALUE", () => {

}, {
    snapshotCount: 0,
    onSnapshot() {
        this.snapshotCount += 1;
    },
    onSnapshotDiscard() {
        this.snapshotCount -=1;
        if (this.snapshotCount === 0) {
            // Whatever for context termination
        }
    },
});

though this feels unneccessary as the only thing we can really do with this information is a ref-count which is presumably how the host would implement this anyway.


For host APIs that use this for accurate tracking, they can just discard the associated snapshot in the job mapping.

i.e. setTimeout could be written like:

function setTimeout(cb, delay) {
    const jobCallbackRecord = HOST_MAKE_JOB_CALLBACK(cb);
    
    scheduleInSystemSomehow(() => {
        HOST_CALL_JOB_CALLBACK(jobCallbackRecord);
        // Discard the snapshot now
        jobCallbackRecord.[[asyncContextSnapshot]].discard();
    }, delay);
}

Jamesernator avatar Jul 01 '23 07:07 Jamesernator

I made a little proof of concept of what I described above using Node's async_hooks. A small example demonstrates that this pattern can in fact observe essentially the exact timing of when all snapshots to a variable are discarded:

import AsyncContext from "./async-context-with-termination/AsyncContext.js";

const variable = new AsyncContext.Variable();

// DO NOT REMOVE THIS LINE: https://github.com/nodejs/node/issues/48651
console.log("START");

variable.runWithTermination(
    "TEST",
    () => {
        // This does get printed basically at the same time as the timeout callback is run
        console.log(`terminated at: ${Date.now()}`);
    },
    () => {
        setTimeout(() => {
            console.log(`callback at: ${Date.now()}`);
        }, 1000);
    },
);

Jamesernator avatar Jul 12 '23 11:07 Jamesernator

Right, I believe @littledan idea is that basically there are 2 kinds of continuations:

  • One shot, where some function is "wrapped" to capture the current context, and called once (or never)
  • Reusable, e.g. a snapshot, or a wrapped function which can be called multiple times

A deterministic termination API would perform some refcount style tracking of the outstanding continuations:

  • when a one shot continuation is created, increment, when it's called, decrement
  • when a reusable continuation is created, increment, and possibly introduce a way to discard reusable continuations which would also decrement.

Such an API would be deterministic, but will miss the case where all outstanding continuations (both one shot and reusable) are collected, and the context is effectively terminated.

I am ok with such an API if it fulfills the need of authors. If a program needs to catch the outstanding continuation collection case, they can already use an object as the context value, and setup a FinalizationRegistry to observe it.

mhofman avatar Jul 12 '23 13:07 mhofman

but will miss the case where all outstanding continuations (both one shot and reusable) are collected, and the context is effectively terminated.

Well for some prior art, Node's AsyncResource does automatically destroy resources on GC unless you tell it not to.

If a program needs to catch the outstanding continuation collection case, they can already use an object as the context value, and setup a FinalizationRegistry to observe it.

I don't know this is actually true though, like if you create a FinalizationRegistry for some snapshots, there is no way to actually then discard them:

const registry = new FinalizationRegistry(holdings => {
    // We don't actually have the AsyncContext.Snapshot to call discard here as it's been
    // collected so there's no way to discard it
});

const snapshot = new AsyncContext.Snapshot();

// There's no useful holdings we can provide here
registry.register(snapshot, ???);

Only the implementation of Snapshot could actually do anything on finalization (which is also exactly what my proof of concept does) as only the host has access to the associated mapping stored in the snapshot.

Jamesernator avatar Jul 12 '23 15:07 Jamesernator

Well for some prior art, Node's AsyncResource does automatically destroy resources on GC unless you tell it not to.

Observability of GC through this API is one thing that I would be opposed to happen for a standardized feature as I stated earlier.

I don't know this is actually true though, like if you create a FinalizationRegistry for some snapshots

It's not the snapshot that you need to register, but the object value provided to variable.run(). The program can arrange that this object not be referenced anywhere but through the AsyncContext mechanism, which means the only path to observe its value would be through variable.get(), and if the context is terminated, it can no longer be observed and thus collected.

mhofman avatar Jul 12 '23 20:07 mhofman

The program can arrange that this object not be referenced anywhere but through the AsyncContext mechanism, which means the only path to observe its value would be through variable.get(), and if the context is terminated, it can no longer be observed and thus collected.

Okay yes that makes sense.

I suppose it gives a decent opportunity too for contexts that care about context termination to determine that the timing was not actually accurate as some API was used that didn't explictly discard their snapshot when they were done.

If snapshot.discard() (or similar) is added, it'd probably be a good idea to make Snapshot a disposable so that users can more easily close them automatically (e.g. as part of a disposable stack in a class).

Jamesernator avatar Jul 12 '23 21:07 Jamesernator

At this point, I think we should conclude that this idea is for a separate proposal, if it ever makes sense. It's very hard to define this concept!

littledan avatar Apr 08 '24 21:04 littledan

I actually had a solution to this a whole bunch of years ago which I made when much earlier iterations of context management were being proposed but I think I sadly failed to explain it effectively at the time. The general idea was to have a "Task" type which wraps continuations like callbacks or promise fulfillment handlers and essentially recursively reference-counts async branches created within the sync execution of each continuation. When a part of the branch resolves it'd reduce the counter for the level it originated from by one and when that count reaches zero it'd propagate upward another level until eventually everything in the async execution graph has resolved. Here's a rough example:

const resolvers = new WeakMap()
const tasks = new WeakMap()
let current = undefined

function incTasks(task) {
  if (!tasks.has(task)) return 0
  const count = tasks.get(task) + 1
  tasks.set(task, count)
  return count
}

function decTasks(task) {
  if (!tasks.has(task)) return 0
  const count = tasks.get(task) - 1
  tasks.set(task, count)

  // If the count reaches zero, run the resolver
  if (count === 0) {
    const resolve = resolvers.get(task)
    if (resolve) resolve(task)
  }

  return count
}

function taskify(task, resolve) {
  if (current) {
    resolve = nestResolver(current, resolve)
  }

  resolvers.set(task, resolve)
  tasks.set(task, 0)
  incTasks(task)

  return function wrapped(...args) {
    const prior = current
    current = task
    try {
      return task.apply(this, args)
    } finally {
      current = prior
      decTasks(task)
    }
  }
}

function nestResolver(current, resolve) {
  incTasks(current)
  return (task) => {
    resolve(task)
    decTasks(current)
  }
}

const asyncTask = taskify(function outer(foo, bar) {
  console.log('Hello, World!', foo, bar)

  setTimeout(taskify(function sub1() {
    console.log('I am the first subtask')
  }, (task) => {
    console.log('resolved first setTimeout', task)
  }), 500)

  setTimeout(taskify(function sub2() {
    console.log('I am the second subtask')
  }, (task) => {
    console.log('resolved second setTimeout', task)
  }), 250)
}, (task) => {
  console.log('resolved outer', task)
})

asyncTask('foo', 'bar')

Which produces the output:

Hello, World! foo bar
I am the second subtask
resolved second setTimeout [Function: sub2]
I am the first subtask
resolved first setTimeout [Function: sub1]
resolved outer [Function: outer]

The use of WeakMaps to store the resolver functions and task counts here is not important. That'd probably be better served by an actual wrapper type. That can act like the original function. (Or promise if an equivalent promise-handling thing was written)

I'm sure this can be done much more efficiently if done properly at the VM level, but this is just a super rough sketch I threw together from memory of a proposal a decade ago soooo... 🤷🏻

Qard avatar Apr 10 '24 23:04 Qard

@Qard, that sounds a lot like the deterministic ref counting that @littledan had proposed.

At this point, I think we should conclude that this idea is for a separate proposal, if it ever makes sense. It's very hard to define this concept!

One problem with delaying to a future proposal is that any existing usage of Snapshot would cause context "leaks". As I mentioned, I would be opposed to non-deterministic context termination, and as such Snapshot instances should be a managed resource, most likely with a @@dispose behavior.

Is there any drawback to specifying an explicit disposal / revocation mechanism for snapshots without specifying a way to observe the termination of the context?

mhofman avatar Apr 11 '24 05:04 mhofman

Yes, basically the same idea. In my original implementation I solved the case of repeatable tasks by just adding one extra increment to represent the handle for the task which is only decremented when the task is cleared/unscheduled.

And yeah, I think it's easy enough to specify a mechanism of following the async branches to track resolution without specifying yet what to actually do when a branch resolves so we can make a later decision on what to do with that information.

An example use case where this could be valuable is Node.js domains--the domain can be held in a store through all the async branches to receive any uncaught exceptions, if the error isn't handled in some way by the time all the branches resolve it can then raise the error out to any outer domain or to the process-level exception handler.

Qard avatar Apr 11 '24 09:04 Qard

Is there any drawback to specifying an explicit disposal / revocation mechanism for snapshots without specifying a way to observe the termination of the context?

We can specify Snapshhot.p[Symbol.dispose] now, but I'm curious how it would behave if you never register it with using/DisposableStack? Like, I imagine it'll be used with a queued callbacks, do they need to register the snapshot before calling snapshot.run(cb)? What about if they want to reuse the snapshot multiple times across different lexical scopes?

If we just treat these as hanging snapshots that prevent termination, I guess that works? It's just so brittle if you forget. Maybe devtools can help here.

jridgewell avatar Apr 11 '24 14:04 jridgewell

Maybe devtools can help here.

A nudge from devtools is definitely what I'm looking for here.

mhofman avatar Apr 11 '24 17:04 mhofman

We need to continue to discuss this. I think we're open to the idea, but having a disposable snapshot that persists means that whatever object is holding the snapshot now needs to be disposable itself, and its holder, etc.

It might be ok, particularly with globally registered libraries that don't really go get collected. But then there's the question of when does a Snapshot.wrap(fn) wrapped function dispose of its internal mapping?

jridgewell avatar Apr 16 '24 22:04 jridgewell

I'm wondering, how often does a wrapped function need to be re-usable. Could we get away with a wrapOnce and new Snapshot() only? The wrapOnce function would automatically dispose after invocation, and the snapshot instance would be a disposable.

mhofman avatar Apr 17 '24 00:04 mhofman

If it is a call once use case this wrapped callback is no longer referenced afterwards and collected including it's wrapped context. Otherwise there would be a bigger leak, not just the wrapped context. But there are several streaming-like usecases where a callback might be called more then once.

Flarna avatar Apr 17 '24 05:04 Flarna