node
node copied to clipboard
ESM loaders cannot be defined via `Worker` option `execArgv` in v20
Version
20.0.0 (tested on main at 36e4e3d86b31de7f38125aacad810e40295fd2ae too)
Platform
macOS but probably all platforms
Subsystem
esm
What steps will reproduce the bug?
The following works in Node 18 and 19, but not 20:
// main.mjs
import { Worker } from 'node:worker_threads';
new Worker('./worker.js', {
execArgv: ['--experimental-loader', './loader.mjs'],
});
// worker.js
'use strict';
async function main() {
await import('node:fs');
}
main();
// loader.mjs
export function resolve (specifier, context, nextResolve) {
throw new Error('boom');
}
Run: node main.mjs
. In Node 18 and 19, the exception in the loader is thrown. In Node 20 it is not. The problem is not unique to throwing. I haven't been able to see console.log()
or any other indication that the loader is being called.
How often does it reproduce? Is there a required condition?
100% of the time for me.
What is the expected behavior? Why is that the expected behavior?
I expect it to work as it did in earlier versions of Node. This is the expected behavior because right now it doesn't seem to work at all.
What do you see instead?
See the description of the bug above.
Additional information
Just a guess, but I'm assuming this is related to https://github.com/nodejs/node/pull/44710.
Another side effect of ES loaders to a separate thread: updating require.extensions
in an es loader no longer has any effect, meaning that node --loader=ts-node/esm
no longer functions properly to load both cjs and mjs typescript.
This makes it significantly more challenging to have loaders that work for both esm and cjs, effectively breaking the esm on-ramp for those of us supporting both modes, since now you need to know ahead of time what the module type is (or write the transpiled source to another path and return a short circuit url to that instead), rather than being able to have a single loader capable of handling both.
Verified that node --loader=ts-node/esm foo.ts
stopped working on 4667b07cd2d6eb0295d6cd1f40163ef14ce538b0
cc: @nodejs/loaders
As stated in the ESM doc, async activity like console.log
(which is specifically cited in the caveat) is not guaranteed to run.
Please try other methods, such as fs.writeSync
to confirm.
I believe this is not a dupe of #47615 because it specifically provides execArgv
, thus avoiding the fork-bomb.
@JakobJingleheimer I've tried fs.writeSync()
and process._rawDebug()
with no luck.
I’m not sure how would it work in the current design. There is (by design) one loader process for the whole Node.js process. In the example above, we are asking Node.js to start another one just for that specific thread.
We need a new API to handle this case, something that:
- allows us to create a "loaders worker", potentially exposing what we are already doing in core
- pass that loader down to
new Worker(path, { loader } )
, so that the loader thread could be reused.
We need a new API to handle this case
What you are proposing seems like an explicit API. It seems like --experimental-loader
passed to the Worker
constructor should implicitly do that (without needing to expose more public APIs).
It's interesting because it doesn't work --loader
in the worker's execArgv
, but it works fine if --loader
is passed to the main script (the worker inherits the flag).
It’s interesting because it doesn’t work
--loader
in the worker’sexecArgv
, but it works fine if--loader
is passed to the main script (the worker inherits the flag).
I don’t think of worker threads like child processes that can have their own Node flags. I feel like they should inherit whatever the parent process has, which is what @targos is describing here.
I don’t think of worker threads like child processes that can have their own Node flags.
That is the exact purpose of the execArgv
option to the Worker
constructor. Some flags are not supported there, but, for example, --require
does appear to be. And the loaders docs do claim that they follow the pattern of --require
the loaders docs do claim that they follow the pattern of
--require
Only in the sense that the way that chaining works for multiple invocations of --require
is the same way that chaining works for multiple invocations of --loader
. There’s not much else that loaders share in common with require
.
Unless there’s some argument for why this should be considered a bug, I think we should just update the docs for https://nodejs.org/api/worker_threads.html#new-workerfilename-options to clarify that --loader
is one of the flags not supported by that option.
I guess my question is, are there actually technical reasons why this can't be implemented?
I guess my question is, are there actually technical reasons why this can't be implemented?
I was about to ask this too. If it works from main, I would've expected it to work here too, but if not, I imagine it would be fairly simple to support from the esm worker 🤔 (unless we get stuck in Inception)
I guess my question is, are there actually technical reasons why this can’t be implemented?
In the current behavior, there’s a loaders thread and it works for both the main application thread and all worker threads (I think); within that loaders thread there’s shared scope that the loaders can use, such as to keep caches and such. If we support execArgv
then that would require spinning up a new loaders thread specific to that worker thread, because the --loader
argument passed to execArgv
might be different than that used for the root process. This presents a few issues:
- There are now multiple loaders threads, with multiple scopes, and so loader authors would need to be aware of this possibility. We would probably get calls for ways to communicate between loaders threads.
- The number of loaders threads would increase linearly with the number of worker threads spawned with
execArgv
, doubling the memory/processing requirements of each worker thread.
In short, it’s a lot of added complexity for both Node and for loader authors, and for what? I don’t see the benefit. If you want to create a new process with different loaders, spawn a child process.
@GeoffreyBooth I agree with your assessment with supporting execArgv
: it will cost too much memory and CPU time. I still think allowing to provide a custom loader for each worker is very useful for all sorts of use cases (mocking, transpiling, ...). Processes are more costly and harder to manage.
IMHO, we need a new API to handle this case, something that:
- allows us to create a "loaders worker", potentially exposing what we are already doing in core.
- pass that loader down to
new Worker(path, { loader } )
, so that the loader thread could be reused across multiple workers.
I still think allowing to provide a custom loader for each worker is very useful for all sorts of use cases
What would a specific use case be? When would you want to do something that only a loader can do, that you wouldn’t want applied to the main application thread?
Keep in mind there’s also the registerLoader
API in progress, that allows for registering loaders after startup.
What would a specific use case be? When would you want to do something that only a loader can do, that you wouldn’t want applied to the main application thread?
CLI tools can't really set Node.js options. registerLoader
would solve all the same use cases of execArgv
in this case. One can always set up a "jump" worker to get things done.
I didn't know it was in the works, is there a tracking issue/pr?
Looks like the PR is https://github.com/nodejs/node/pull/46826. I believe that would work for my use case.
I still think execArgv
should be an option, but I also don't care enough to push for it if there is an alternative that works for me.
I believe that would work for my use case.
What is your use case?
CLI tools can’t really set Node.js options.
We’ve been discussing adding support for .env
files: https://github.com/nodejs/node/pull/43973#issuecomment-1249549346. This would be a shorter-term solution for being able to set NODE_OPTIONS
from a file. Farther down the road I think we’d like to support a new field in package.json
(or possibly even a new config file) that would do the same thing.
What is your use case?
Using loaders in a worker without requiring users to specify CLI flags. Using a shebang is not really an option for me either, and ideally I don't want to spawn a child process just to hide the CLI flag. The fact that the loader thread is shared between workers and the main thread is not a problem for me.
Using loaders in a worker without requiring users to specify CLI flags.
Can you explain a bit more? Is this a CLI tool? What does it do? Why does it create worker threads?
I can't go into too much detail yet unfortunately, as it is for work and still unreleased. There is a CLI, but also an SDK. The loader itself is just used for hot module reloading, so I think registerLoader()
will work just fine.
The loader itself is just used for hot module reloading, so I think
registerLoader()
will work just fine.
If you find a good general solution for hot module reloading, please share it.
The registerLoader
PR is by a new contributor and I’m not sure he’s working on it any more, so if you have time to help get it finished, that would be greatly appreciated.
I will take a look at the registerLoader
PR and can take it over if the current author is no longer interested in working on it.
In the current behavior, there’s a loaders thread and it works for both the main application thread and all worker threads (I think)
That's wrong. Each worker thread has its own loaders thread. Without evidence of the contrary, I think we should consider that there is a bug somewhere.
Found the bug: when a worker is created with execArgv
, its child loaders worker doesn't inherit the CLI arguments, so we have:
main thread: []
main thread loaders: []
worker thread: ['--experimental-loader', './loader.mjs']
worker thread loaders: []
Edit: sorry, pinged in the wrong issue
In the current behavior, there’s a loaders thread and it works for both the main application thread and all worker threads (I think)
That's wrong. Each worker thread has its own loaders thread. Without evidence of the contrary, I think we should consider that there is a bug somewhere.
The intended behaviour is for there to be 1 esm worker, regardless of other workers. I'm not sure about child process—perhaps if they are separable they should have their own esm worker, but that behaviour is not currently defined.
I'm in the same boat as other right now...
If import('blob:nodedata:fb4b1f92-71eb-42f9-b371-26ed3204a625')
had just worked as intended, then i wouldn't have to deal with this loader thing right now... i guess it is also the real possibility of not also being able to fetch a blob url from a different worker / off -thread script (https://github.com/nodejs/node/issues/46557)
if we are going to have multiple loader threads then i feel like it would be impossible to have one centralized globally blob store over all threads / workers if they aren't handled in core in a centralized maner where they can share resources
@kirrg001 when behaviour changes between releases, step 1 is check the release notes:
https://nodejs.org/en/blog/announcements/v20-release-announce#custom-esm-loader-hooks-nearing-stable
Custom ES module lifecycle hooks supplied via loaders (
--experimental-loader=./foo.mjs
) now run in a dedicated thread, isolated from the main thread.