node icon indicating copy to clipboard operation
node copied to clipboard

ESM loaders cannot be defined via `Worker` option `execArgv` in v20

Open cjihrig opened this issue 1 year ago • 60 comments

Version

20.0.0 (tested on main at 36e4e3d86b31de7f38125aacad810e40295fd2ae too)

Platform

macOS but probably all platforms

Subsystem

esm

What steps will reproduce the bug?

The following works in Node 18 and 19, but not 20:

// main.mjs
import { Worker } from 'node:worker_threads';

new Worker('./worker.js', {
  execArgv: ['--experimental-loader', './loader.mjs'],
});
// worker.js
'use strict';

async function main() {
  await import('node:fs');
}

main();
// loader.mjs
export function resolve (specifier, context, nextResolve) {
  throw new Error('boom');
}

Run: node main.mjs. In Node 18 and 19, the exception in the loader is thrown. In Node 20 it is not. The problem is not unique to throwing. I haven't been able to see console.log() or any other indication that the loader is being called.

How often does it reproduce? Is there a required condition?

100% of the time for me.

What is the expected behavior? Why is that the expected behavior?

I expect it to work as it did in earlier versions of Node. This is the expected behavior because right now it doesn't seem to work at all.

What do you see instead?

See the description of the bug above.

Additional information

Just a guess, but I'm assuming this is related to https://github.com/nodejs/node/pull/44710.

cjihrig avatar Apr 27 '23 14:04 cjihrig

Another side effect of ES loaders to a separate thread: updating require.extensions in an es loader no longer has any effect, meaning that node --loader=ts-node/esm no longer functions properly to load both cjs and mjs typescript.

This makes it significantly more challenging to have loaders that work for both esm and cjs, effectively breaking the esm on-ramp for those of us supporting both modes, since now you need to know ahead of time what the module type is (or write the transpiled source to another path and return a short circuit url to that instead), rather than being able to have a single loader capable of handling both.

isaacs avatar Apr 27 '23 19:04 isaacs

Verified that node --loader=ts-node/esm foo.ts stopped working on 4667b07cd2d6eb0295d6cd1f40163ef14ce538b0

isaacs avatar Apr 27 '23 19:04 isaacs

cc: @nodejs/loaders

cjihrig avatar Apr 28 '23 13:04 cjihrig

As stated in the ESM doc, async activity like console.log (which is specifically cited in the caveat) is not guaranteed to run.

Please try other methods, such as fs.writeSync to confirm.

I believe this is not a dupe of #47615 because it specifically provides execArgv, thus avoiding the fork-bomb.

JakobJingleheimer avatar Apr 28 '23 13:04 JakobJingleheimer

@JakobJingleheimer I've tried fs.writeSync() and process._rawDebug() with no luck.

cjihrig avatar Apr 28 '23 13:04 cjihrig

I’m not sure how would it work in the current design. There is (by design) one loader process for the whole Node.js process. In the example above, we are asking Node.js to start another one just for that specific thread.

We need a new API to handle this case, something that:

  1. allows us to create a "loaders worker", potentially exposing what we are already doing in core
  2. pass that loader down to new Worker(path, { loader } ), so that the loader thread could be reused.

mcollina avatar Apr 28 '23 13:04 mcollina

We need a new API to handle this case

What you are proposing seems like an explicit API. It seems like --experimental-loader passed to the Worker constructor should implicitly do that (without needing to expose more public APIs).

cjihrig avatar Apr 28 '23 13:04 cjihrig

It's interesting because it doesn't work --loader in the worker's execArgv, but it works fine if --loader is passed to the main script (the worker inherits the flag).

targos avatar Apr 28 '23 15:04 targos

It’s interesting because it doesn’t work --loader in the worker’s execArgv, but it works fine if --loader is passed to the main script (the worker inherits the flag).

I don’t think of worker threads like child processes that can have their own Node flags. I feel like they should inherit whatever the parent process has, which is what @targos is describing here.

GeoffreyBooth avatar Apr 28 '23 16:04 GeoffreyBooth

I don’t think of worker threads like child processes that can have their own Node flags.

That is the exact purpose of the execArgv option to the Worker constructor. Some flags are not supported there, but, for example, --require does appear to be. And the loaders docs do claim that they follow the pattern of --require

cjihrig avatar Apr 28 '23 16:04 cjihrig

the loaders docs do claim that they follow the pattern of --require

Only in the sense that the way that chaining works for multiple invocations of --require is the same way that chaining works for multiple invocations of --loader. There’s not much else that loaders share in common with require.

Unless there’s some argument for why this should be considered a bug, I think we should just update the docs for https://nodejs.org/api/worker_threads.html#new-workerfilename-options to clarify that --loader is one of the flags not supported by that option.

GeoffreyBooth avatar Apr 28 '23 16:04 GeoffreyBooth

I guess my question is, are there actually technical reasons why this can't be implemented?

cjihrig avatar Apr 28 '23 16:04 cjihrig

I guess my question is, are there actually technical reasons why this can't be implemented?

I was about to ask this too. If it works from main, I would've expected it to work here too, but if not, I imagine it would be fairly simple to support from the esm worker 🤔 (unless we get stuck in Inception)

JakobJingleheimer avatar Apr 28 '23 16:04 JakobJingleheimer

I guess my question is, are there actually technical reasons why this can’t be implemented?

In the current behavior, there’s a loaders thread and it works for both the main application thread and all worker threads (I think); within that loaders thread there’s shared scope that the loaders can use, such as to keep caches and such. If we support execArgv then that would require spinning up a new loaders thread specific to that worker thread, because the --loader argument passed to execArgv might be different than that used for the root process. This presents a few issues:

  • There are now multiple loaders threads, with multiple scopes, and so loader authors would need to be aware of this possibility. We would probably get calls for ways to communicate between loaders threads.
  • The number of loaders threads would increase linearly with the number of worker threads spawned with execArgv, doubling the memory/processing requirements of each worker thread.

In short, it’s a lot of added complexity for both Node and for loader authors, and for what? I don’t see the benefit. If you want to create a new process with different loaders, spawn a child process.

GeoffreyBooth avatar Apr 28 '23 16:04 GeoffreyBooth

@GeoffreyBooth I agree with your assessment with supporting execArgv: it will cost too much memory and CPU time. I still think allowing to provide a custom loader for each worker is very useful for all sorts of use cases (mocking, transpiling, ...). Processes are more costly and harder to manage.

IMHO, we need a new API to handle this case, something that:

  1. allows us to create a "loaders worker", potentially exposing what we are already doing in core.
  2. pass that loader down to new Worker(path, { loader } ), so that the loader thread could be reused across multiple workers.

mcollina avatar Apr 28 '23 17:04 mcollina

I still think allowing to provide a custom loader for each worker is very useful for all sorts of use cases

What would a specific use case be? When would you want to do something that only a loader can do, that you wouldn’t want applied to the main application thread?

Keep in mind there’s also the registerLoader API in progress, that allows for registering loaders after startup.

GeoffreyBooth avatar Apr 28 '23 17:04 GeoffreyBooth

What would a specific use case be? When would you want to do something that only a loader can do, that you wouldn’t want applied to the main application thread?

CLI tools can't really set Node.js options. registerLoader would solve all the same use cases of execArgv in this case. One can always set up a "jump" worker to get things done. I didn't know it was in the works, is there a tracking issue/pr?

mcollina avatar Apr 28 '23 17:04 mcollina

Looks like the PR is https://github.com/nodejs/node/pull/46826. I believe that would work for my use case.

I still think execArgv should be an option, but I also don't care enough to push for it if there is an alternative that works for me.

cjihrig avatar Apr 28 '23 17:04 cjihrig

I believe that would work for my use case.

What is your use case?

CLI tools can’t really set Node.js options.

We’ve been discussing adding support for .env files: https://github.com/nodejs/node/pull/43973#issuecomment-1249549346. This would be a shorter-term solution for being able to set NODE_OPTIONS from a file. Farther down the road I think we’d like to support a new field in package.json (or possibly even a new config file) that would do the same thing.

GeoffreyBooth avatar Apr 28 '23 18:04 GeoffreyBooth

What is your use case?

Using loaders in a worker without requiring users to specify CLI flags. Using a shebang is not really an option for me either, and ideally I don't want to spawn a child process just to hide the CLI flag. The fact that the loader thread is shared between workers and the main thread is not a problem for me.

cjihrig avatar Apr 28 '23 18:04 cjihrig

Using loaders in a worker without requiring users to specify CLI flags.

Can you explain a bit more? Is this a CLI tool? What does it do? Why does it create worker threads?

GeoffreyBooth avatar Apr 28 '23 18:04 GeoffreyBooth

I can't go into too much detail yet unfortunately, as it is for work and still unreleased. There is a CLI, but also an SDK. The loader itself is just used for hot module reloading, so I think registerLoader() will work just fine.

cjihrig avatar Apr 28 '23 18:04 cjihrig

The loader itself is just used for hot module reloading, so I think registerLoader() will work just fine.

If you find a good general solution for hot module reloading, please share it.

The registerLoader PR is by a new contributor and I’m not sure he’s working on it any more, so if you have time to help get it finished, that would be greatly appreciated.

GeoffreyBooth avatar Apr 28 '23 18:04 GeoffreyBooth

I will take a look at the registerLoader PR and can take it over if the current author is no longer interested in working on it.

cjihrig avatar Apr 28 '23 18:04 cjihrig

In the current behavior, there’s a loaders thread and it works for both the main application thread and all worker threads (I think)

That's wrong. Each worker thread has its own loaders thread. Without evidence of the contrary, I think we should consider that there is a bug somewhere.

targos avatar May 01 '23 09:05 targos

Found the bug: when a worker is created with execArgv, its child loaders worker doesn't inherit the CLI arguments, so we have:

main thread: [] main thread loaders: [] worker thread: ['--experimental-loader', './loader.mjs'] worker thread loaders: []

targos avatar May 01 '23 09:05 targos

Edit: sorry, pinged in the wrong issue

targos avatar May 05 '23 14:05 targos

In the current behavior, there’s a loaders thread and it works for both the main application thread and all worker threads (I think)

That's wrong. Each worker thread has its own loaders thread. Without evidence of the contrary, I think we should consider that there is a bug somewhere.

The intended behaviour is for there to be 1 esm worker, regardless of other workers. I'm not sure about child process—perhaps if they are separable they should have their own esm worker, but that behaviour is not currently defined.

JakobJingleheimer avatar May 05 '23 15:05 JakobJingleheimer

I'm in the same boat as other right now... If import('blob:nodedata:fb4b1f92-71eb-42f9-b371-26ed3204a625') had just worked as intended, then i wouldn't have to deal with this loader thing right now... i guess it is also the real possibility of not also being able to fetch a blob url from a different worker / off -thread script (https://github.com/nodejs/node/issues/46557)

if we are going to have multiple loader threads then i feel like it would be impossible to have one centralized globally blob store over all threads / workers if they aren't handled in core in a centralized maner where they can share resources

jimmywarting avatar May 10 '23 00:05 jimmywarting

@kirrg001 when behaviour changes between releases, step 1 is check the release notes:

https://nodejs.org/en/blog/announcements/v20-release-announce#custom-esm-loader-hooks-nearing-stable

Custom ES module lifecycle hooks supplied via loaders (--experimental-loader=./foo.mjs) now run in a dedicated thread, isolated from the main thread.

JakobJingleheimer avatar Jun 16 '23 09:06 JakobJingleheimer