html Ergonomic way to move data between workers

What problem are you trying to solve?

Right now, call-and-response communications with a worker are pretty cumbersome. Complex libraries have been created to try and make this easier.

What solutions exist today?

Message ports, and utilities like comlink.

How would you solve it?

This builds on the blank worker proposal.

ShadowRealm is now stage 3, and I think it has ideas we could borrow.

const worker = new Worker('about:blankjs', { type: 'module' });

// Import into worker:
await worker.importValue(specifier);

// Import and get export:
const value = await worker.importValue(specifier, exportName);

importValue would throw if the worker is not type: 'module'.

The export can be anything structured cloneable, but can be or can include functions.

When functions are called, the args are cloned and the function in the worker is called with those args. The return value is cloned, and used to resolve the function on the caller's side.

For example:

worker-utils.js

export createNumbersArray(length) {
  return Array.from({ length }, (_, i) => i);
}

index.js

const worker = new Worker('about:blankjs', { type: 'module' });
const createNumbersArray = await worker.importValue('./worker-utils.js', 'createNumbersArray');

const numbersArray = await createNumbersArray(3);
// [0, 1, 2]

Anything else?

It'd be nice if certain values could be marked as "transferrable" rather than cloneable. Tranferrable streams would benefit from this too.

Jan 19 '24 14:01 jakearchibald

Very interesting, would be great to see something like this. While having this shape of API for the blank worker use case makes a lot of sense, for the use case where the worker itself has a "top-level" module definition, it might also make sense to simplify the interface further to some kind of top-level exports use:

const worker = new Worker('./mod.mjs', { type: 'module' });
const { createNumbersArray, anotherExport } = worker.getExports(['createNumbersArray', 'anotherExport']);

Jan 19 '24 21:01 guybedford

The dynamic behavior allowed by importValue()'s exportName argument makes sense in the context of shadow realms which are heavy on eval-semantics already, but I am concerned it would lend itself to a style that makes it harder for static analysis for security purposes, like for auditing WebExtensions where there's definitely problems with bad actors trying to do tricky things via obfuscated dynamism. Like, it's fine if there's always a string literal there, but the API shape feels like it would be just as idiomatic to use a variable which opens up all kinds of avenues of dynamism, and the need for code auditors to potentially re-litigate the exact same debate over about how submitted code should use the API. In particular, I can imagine developers wanting to use a for loop there.

I feel like there had previously been discussion about enabling something like await worker.remoteImport('the-script.mjs') that could have similar semantics to import(), in particular returning a module namespace object? Are there major spec complexities preventing a solution like that? While this of course still allows nefarious dynamic behavior, it's already the identical problem to import() and allows consistent policy enforcement, like accessing the module via remotedModule.createNumbersArray().

The downside with this alternative is of course that it would potentially do significantly more work than required if the imported script has more exported functions than the caller wants to call. Although obviously it's possible with modules to not export more than is actually desired to proxy, and this approach could arguably be beneficial to static-analysis-based auditors since it would encourage the code authors to limit their number of exports because their code would perform worse because of the wasted exports.

Jan 20 '24 01:01 asutherland

The dynamic behavior allowed by importValue()'s exportName argument makes sense in the context of shadow realms which are heavy on eval-semantics already, but I am concerned it would lend itself to a style that makes it harder for static analysis for security purposes, like for auditing WebExtensions where there's definitely problems with bad actors trying to do tricky things via obfuscated dynamism.

This is a wider concern with string specifiers as a dynamic import mechanism. The problem would be generally solved for all forms of dynamic import (including this one) with the module expression blocks proposal.

Jan 20 '24 03:01 Jamesernator

There are cases where blocks are handy, but I don't think they should be required to improve worker communication. Having worker code in another file is usually a benefit.

Jan 22 '24 12:01 jakearchibald

@guybedford yeah, I agree that would be handy (although getExports would need to return a promise). I was trying to avoid creating an API shape that was so different to shadow realms, but maybe that doesn't matter.

Jan 22 '24 12:01 jakearchibald

Edit: I thought a bit more about this and I think I prefer markForTransfer, so I'll hide this comment.

When functions are called, the args are cloned and the function in the worker is called with those args. The return value is cloned, and used to resolve the function on the caller's side.

I haven't formed a strong opinion on this question yet, but from my experience using Comlink (in production, real-world use cases), I think it might make more sense for the transferrable parts of the return value to be transferred (rather than cloned) by default.

Some loose thoughts:

I don't think there's a strong argument for consistency with postMessage here (clone-by-default) because this is a significantly higher-level interface. It's different enough from the dev's perspective that we can design from a fresh slate I think.
Wrapping the return value in structuredClone() is easy/obvious, and of course allows for partial cloning via structuredClone(obj, {transfer:[obj.foo.buffer]}) - i.e. nothing new for the dev to learn.
- Question: How obvious would bugs be to devs who write code that assumes clone-by-default? Are there "dangerous"/subtle bugs for common usage patterns, or will they get an immediate/obvious error message in the vast majority of cases?
Removes the need for something like { value: port2, transferList: [port2] } (akin to returning Comlink.transfer({...})) which makes functions less "isomorphic" RE main thread vs worker thread usage. This was a bit annoying for me in my usage of Comlink (requires an extra input param to tell the function I do/don't want a Comlink.transfer object).
How common is it for a module to export a function that returns internal state? I.e. stuff that said exported function still "cares" about? In my Comlink use cases the exported functions tend to be stateless "work horses". For this usage pattern I think transfer-by-default makes sense.
- Clone-by-default makes sense for the parameters passed into Comlink-type interface since the main thread/module does often pass internal state out to "work horse" functions.
- But this whole point may be moot because other usage patterns may be/become dominant.

Relevant comment from Jamesernator before this issue got forked off the Blank worker proposal issue.

Jan 23 '24 13:01 josephrocca

I don't think we should get too bogged down in the transferable issue, but I think the solution here should be the same as it would be for transferred streams.

Something like:

const blob = new Blob(…);
markForTransfer(blob);

At this point blob is now detached, but the realm has a "marked for transfer" set containing the blob.

Then, in StructuredSerializeWithTransfer, if an object is in the "marked for transfer" set, it's transferred, and removed from the set.

Having one API that's transfer by default seems weird.

Jan 23 '24 13:01 jakearchibald

I like the markForTransfer idea, it definitely feels ergonomic in a way that transfer lists do not and where transfer lists would be a real problem here. In general I think lessons from CORBA and other APIs are that it's hard/inadvisable to hide RPC boundaries and so it would be quite reasonable for a proposal like this to make it more ergonomic to perform what amounts to RPC, but that code still would need to account for that, including participating in semantics-impacting decisions like marking things for transfer.

Have there been similar discussions of this proposal elsewhere, and in particular that have been TAG reviewed? I'm having trouble finding other examples of markForTransfer specifically.

I should also note:

Per the File API WebIDL, Blobs and Files are not currrently marked Transferable.
Blob.close() was explicitly removed from the spec and markForTransfer presumably amounts to the same thing, as it seems like one would specify the transfer set to be cleared at the conclusion of the task, and if it wasn't transferred, it's now effectively closed. Although this potentially creates foot-guns if the method ever goes async after calling markForTransfer. One would really want an explicit and clear lifetime if allowing the pending transfer to outlive a request. Like let t = new TransferDecorator(); t.markForTransfer(transferrableThing); await somePromise; return t. But that still sounds like it's similar to the comlink approach?

Jan 23 '24 18:01 asutherland

In chatting with @guybedford, a challenge across JavaScript runtimes right now is how to run an untrusted guest via WebAssembly. The current design of that interface seems to be more aligned with a WASM host and guest being within more-or-less the same trust boundary. A WASM guest can deny service to the event-loop of the host and the host has nothing that it can do to prevent that.

For a host to prevent that, a parent controlling event loop is required. We can accomplish that by having a Worker be the host of the WASM instance. The event loop that spawns the worker can enforce deadlines and cancellations by managing the lifetime of the whole Worker.

However, the ergonomics of this leave a lot to be desired; packaging re-usable code across runtimes so that some is run in a worker and some in the main thread is challenging. Setting up the communication channels in a cross-platform way is also challenging for the same reasons motivating this discussion.

It seems like a version of this proposal might offer a more ergonomic way of handling this intermediary Worker, which is why I bring up the use-case.

Feb 21 '24 00:02 ggoodman

html html copied to clipboard

Ergonomic way to move data between workers

What problem are you trying to solve?

What solutions exist today?

How would you solve it?

Anything else?

Some loose thoughts:

html
html copied to clipboard