jco Selecting instantiation mode at compile time is limiting

I've just encountered this error message in Chrome:

bundle.js:487 Uncaught RangeError: WebAssembly.Instance is disallowed on the main thread, if the buffer size is larger than 8MB. Use WebAssembly.instantiate, or use the flag `--enable-features=WebAssemblyUnlimitedSyncCompilation`.
    at bundle.js:487:28
    at Application.instantiate (bundle.js:4439:28)
    at Application.execute (bundle.js:483:30)
    at Application.run (bundle.js:513:17)
    at <anonymous>:1:7

That is ... a problem. I want to provide an option to do everything synchronously (except for the necessary network requests that can be run in advance), provided that you are running in some context where blocking is OK (like a web worker). But I think that completely removing the option to run outside of a web worker, even if it's generally unwise, is a regression, if for no other reason that it makes it impossible to quickly load something in DevTools and experiment.

I think I want JCO to allow me to make this choice at runtime, not at compile-time.

Feb 13 '24 06:02 whitequark

I guess we could create an --instantiation hybrid mode that calls the function, and if it returns a promise conditionally awaits it. Would gladly take a PR along those lines, and if it's not too big of a diff we could even make it the default instantiation mode.

Feb 13 '24 17:02 guybedford

@guybedford So while working on this I bumped into a question, namely, why do we have separate getCoreModule and instantiateCore = <some default behavior>? This seems to serve no purpose and creates a lot of complexity because there are three possible cases, all of which may be desirable in some circumstances:

getCoreModule returns a Promise<Response> and instantiateCore is WebAssembly.instantiateStreaming (which is actually slower than it has to be with --instantiation async because you first await all getCoreModule promises and only then start instantiation, which leaves parallelism opportunities on the table--that can get tangible with how many core modules jco outputs!)
getCoreModule returns an ArrayBuffer or Promise<ArrayBuffer> and instantiateCore is WebAssembly.instantiate. Works well if you don't necessarily have a Response around (although you can always construct a Response out of an ArrayBuffer if you want), but still means async instantiation, so not always desirable or viable.
getCoreModule returns an ArrayBuffer (not a promise) and instantiateCore is new WebAssembly.Instance. Works well in fully synchronous circumstances, but doesn't work in many cases on the main thread, and is generally undesirable unless you have to do it this way.

A core observation here is that getCoreModule and instantiateCore have a contract that's essentially private between them; the generated core will await the result of getCoreModule but if it's not a Thenable then nothing happens, so this has no effect other than reducing parallelism in case (1).

I propose merging the two into a single instantiateCoreModule function. In this case the signature of the module will be:

type InstantiateCoreFn = (filename: string) => WebAssembly.Instance | Thenable<WebAssembly.Instance>;
export function instantiate(instantiateCoreModule: InstantiateCoreFn, imports: object): Exports | Promise<Exports>;

with the contract that if any of instantiateCoreModule invocations return a Promise then instantiate returns a Promise<Exports>, otherwise it returns Exports. This contract can even be exposed in TypeScript via overloads! Something like this should work:

type InstantiateCoreFnAsync = (filename: string) => WebAssembly.Instance | Thenable<WebAssembly.Instance>;
type InstantiateCoreFnSync = (filename: string) => WebAssembly.Instance;

export function instantiate(instantiateCoreModule: InstantiateCoreFnSync, imports: object): Exports;
export function instantiate(instantiateCoreModule: InstantiateCoreFnAsync, imports: object): Exports | Promise<Exports> { /* implementation */ }

What do you think? Also, how much backwards compatibility with existing instantiation schemes do we need?

Feb 21 '24 00:02 whitequark

Oh. That's of course not possible because later modules can depend on earlier ones.

Feb 21 '24 00:02 whitequark

OK so this actually looks to be quite challenging with JCO's architecture. To recap, right now the generated code for --instantiate async looks like this:

export async function instantiateAsync(getCoreModule, imports, instantiateCore) {
  const module0 = getCoreModule('yosys.core.wasm');
  const module1 = getCoreModule('yosys.core2.wasm');
  const module2 = getCoreModule('yosys.core3.wasm');
  const module3 = getCoreModule('yosys.core4.wasm');
  //...
  Promise.all([module0, module1, module2, module3]).catch(() => {});
  
  ({ exports: exports0 } = await instantiateCore(await module0));
  ({ exports: exports1 } = await instantiateCore(await module1, { exports0 }));
  ({ exports: exports2 } = await instantiateCore(await module2, { exports0, exports1 }));
  ({ exports: exports3 } = await instantiateCore(await module3, { exports0, exports1, exports2 }));
  return exports3;
}

To do hybrid instantiation it should look like this:

function instantiateHybrid(moduleOrPromise, imports, fn) {
  if (moduleOrPromise instanceof WebAssembly.Module) {
    return fn(new WebAssembly.Instance(moduleOrPromise, imports))
  } else {
    return WebAssembly.instantiateStreaming(moduleOrPromise, imports).then(fn);
  }
}

export function instantiateAsync(getCoreModule, imports) {
  const module0 = getCoreModule('yosys.core.wasm');
  const module1 = getCoreModule('yosys.core2.wasm');
  const module2 = getCoreModule('yosys.core3.wasm');
  const module3 = getCoreModule('yosys.core4.wasm');
  //...
  Promise.all([module0, module1, module2, module3]).catch(() => {});
  
  return instantiateHybrid(module0, {}, ({ exports: exports0 }) => {
    return instantiateHybrid(module1, { exports0, exports1 }, ({ exports: exports1 }) => {
      return instantiateHybrid(module2, { exports0, exports1, exports2 }, ({ exports: exports2 }) => {
        return instantiateHybrid(module3, { exports0, exports1, exports2, exports3 }, ({ exports: exports3 }) => {
          return exports3;
        });
      });
    });
  });
}

I am sure you can see why I found this challenging to implement with the current JCO codegen.

Feb 21 '24 01:02 whitequark

You might ask, "why not simply extract the Module from a fulfilled Promise<Module> when getCoreModule returns one? I thought I'd do that at first, but it seems that there is no way to do this whatsoever, as [[PromiseResult]] is a private field you can't get at from JS, and awaiting or thening a fulfilled promise (even a trivially fulfilled one, created with e.g. Promise.resolve) always happens on the next microtick, and so it will never be synchronous.

It does turn out that you can somewhat sidestep this with a custom Thenable that isn't a Promise like this:

await { then(onFulfilled) { onFulfilled(1); } } // => 1
typeof (await { then(onFulfilled) { onFulfilled(1); } }) // => 'number'

But you can only use await in an async function (or toplevel), and an async function always returns a Promise, so in the end, no matter what you do inside, you're still left with a Promise that you can't do anything with synchronously.

Feb 21 '24 01:02 whitequark

Your solution looks exactly correct to me yeah, unless we entirely emit two instantiation wrappers - instantiateAsync and instantiateSync where you select which one to use at the top-level for a runtime decision.

Feb 21 '24 20:02 guybedford

unless we entirely emit two instantiation wrappers - instantiateAsync and instantiateSync where you select which one to use at the top-level for a runtime decision.

This would be by far the easiest, but it could result in very substantial code bloat.

Feb 21 '24 21:02 whitequark

Surely not more than a few hundred lines of JS? Or do you mean for the Rust code?

Feb 21 '24 21:02 guybedford

Surely not more than a few hundred lines of JS?

I was thinking of that yeah. I suppose that compared to the megabyte-sized wasm blobs that I ship it's not really important.

Feb 21 '24 21:02 whitequark

Making both sync and async versions of a function with a single implementation is exactly what the gensync package does, incidentally; it may allow you to avoid the code bloat.

Feb 25 '24 01:02 bakkot

GenSync solves this problem for you. All you have to do is replace async function f() {} with let { sync: syncF, async: asyncF } = function* f() {}, replace all awaits with yield*s, and wrap up any possibly-asynchronous functions you want to call with gensync({ sync: f, async: f }).

That seems like something that JCO itself would have to do?

Feb 25 '24 02:02 whitequark

Sorry, yes, I was suggesting using it in JCO, probably inlined into the generated file, to provide the instantiateSync and instantiateAsync methods without needing to duplicate the implementation.

Feb 25 '24 02:02 bakkot

Ah OK. Yes, I could probably use the approach of a generator to implement this. I actually completely forgot generator functions exist in JS--thank you, this is helpful.

Feb 25 '24 02:02 whitequark

I put together an initial experiment here in https://github.com/bytecodealliance/jco/pull/413, thanks for the idea @bakkot.

@whitequark I'm not completely sure of the exact use case, and I don't personally have any need for this, but let me know if you have any thoughts further on it.

Mar 23 '24 00:03 guybedford

Landed in https://github.com/bytecodealliance/jco/pull/413. Then the default instantiation can be switched to sync in a future major.

Jun 28 '24 00:06 guybedford

Thanks! I've rebuilt YoWASP packages with jco 1.3.0 and can confirm everything works perfectly.

You can try it for yourself:

const { runYosys } = await import('https://cdn.jsdelivr.net/npm/@yowasp/[email protected]/gen/bundle.js')
await runYosys(['--version'])
runYosys(['--version'], {}, {synchronously:true})

Jun 28 '24 16:06 whitequark