node icon indicating copy to clipboard operation
node copied to clipboard

Move ESM loaders off-thread

Open JakobJingleheimer opened this issue 2 years ago • 1 comments

Resolves https://github.com/nodejs/node/issues/43658

JakobJingleheimer avatar Sep 18 '22 11:09 JakobJingleheimer

Review requested:

  • [ ] @nodejs/modules

nodejs-github-bot avatar Sep 18 '22 11:09 nodejs-github-bot

Does each worker thread get its own loader thread? E.g. if an app launches 2x worker threads, there will be 6x threads total: one main, 2x worker, 3x loader?

cspotcode avatar Sep 27 '22 18:09 cspotcode

The current design puts all loaders in the same worker thread (eg there are a total of 2 threads: the main and the worker). Internal things are handled via the main thread; userland things are handled by the worker thread.

JakobJingleheimer avatar Sep 27 '22 18:09 JakobJingleheimer

I think that's slightly different than what I'm asking about. If a node app uses the worker threads API to launch worker threads, does each get its own loader thread? (where each loader thread may have one or more loaders running within it)

Not does each loader get a thread, but does each user thread get a loader thread.

cspotcode avatar Sep 27 '22 18:09 cspotcode

If a node app uses the worker threads API to launch worker threads, does each get its own loader thread?

I think the current state (before this PR) is that loaders always execute in the main thread, even if they’re “for” user code that is in a worker thread. So I would think that after this PR, loaders would run inside their worker thread, regardless of whether they’re customizing main or worker thread user code.

So in other words, for an app using custom loaders where user code that spawns three workers, there are five threads total: main, loaders, and the three workers spawned by the user code.

This is just my guess, others can please correct me if I’m mistaken.

GeoffreyBooth avatar Sep 27 '22 18:09 GeoffreyBooth

I don't know how it's implemented but I would expect that each Node.js thread (main and workers) has its own separate loaders thread, at least for two reasons:

  • Worker threads are supposed to be as isolated as possible from the main thread and from each other
  • You can spawn a worker thread with a different set of --loader flags.

targos avatar Sep 27 '22 18:09 targos

First pass this seems fine but likely should investigate a thread per loader URL/some key. Having it per thread they instrument would be more costly if you spawn/tear down threads.

On Tue, Sep 27, 2022, 1:24 PM Michaël Zasso @.***> wrote:

I don't know how it's implemented but I would expect that each Node.js thread (main and workers) has its own separate loaders thread, at least for two reasons:

  • Worker threads are supposed to be as isolated as possible from the main thread and from each other
  • You can spawn a worker thread with a different set of --loader flags.

— Reply to this email directly, view it on GitHub https://github.com/nodejs/node/pull/44710#issuecomment-1259889616, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZJI7YII3ZMWWAGIMOC6LWAM3WJANCNFSM6AAAAAAQPNEM7E . You are receiving this because you are on a team that was mentioned.Message ID: @.***>

bmeck avatar Sep 27 '22 18:09 bmeck

What remains to be done?

targos avatar Oct 08 '22 11:10 targos

Something in this implementation is causing node to hang on startup. We suspect it's some kind of circular dependency, and the subsequent dependency smacks into the Atomics lock (preventing the rest of the flow to complete, which would release the initial lock). I think this circular dep is between ESMLoader and Worker (it's rather difficult to troubleshoot as output is swallowed).

There is a working PoC, so we know this works in principle (and that the Atomics otherwise behave appropriately, so the problem lies with integrating it into node).

This was on hold whilst I was on holiday. I'm working on it today and tomorrow (but will be away on a business trip next week).

JakobJingleheimer avatar Oct 08 '22 11:10 JakobJingleheimer

I found the circular dep:

process/esm_loader.js:12modules/esm/public_loader_proxy.js:3worker_thread.js:29/36process/pre_execution.js:562process/esm_loader.js

JakobJingleheimer avatar Oct 08 '22 19:10 JakobJingleheimer

Gah, it wasn't a circular dependency 🤦‍♂️ it was setting url to null here:

https://github.com/nodejs/node/blob/9836c67198b2d01350f283e10b872849e81abefa/lib/internal/worker.js#L141-L149

which caused the worker to spawn with nothing in it.

(I shouldn't have copied it over from the previous attempt at off-threading, which was later hard-coding the url).

I'm not sure if the internal worker should go through the classic path, or merely get require()ed (I tried both, and it doesn't seem to make a difference).

There is (now was) a circular dependency issue after the empty worker issue is addressed in

https://github.com/nodejs/node/blob/9836c67198b2d01350f283e10b872849e81abefa/lib/internal/main/worker_thread.js#L142

that's now fixed too.

NODE_DEBUG=esm,worker ./out/Release/node -p 1 still doesn't print 1, BUT it does now print debugging output, and we can see it getting fairly far along now, so progress:

ESM 13018: hooking up public ESMLoader
ESM 13018: creating worker for publicESMLoader
[…]
WORKER 15812: instantiating Worker. url: file:///…/nodejs/node/lib/internal/modules/esm/worker.js doEval: classic
WORKER 15812: [0] created Worker with ID 1

esm/worker.js's debug (worker for public ESM running) is not printing, so that's likely why things have stalled (as soon as the worker is running, it releases the lock).

JakobJingleheimer avatar Oct 09 '22 12:10 JakobJingleheimer

I'm currently investigating the V8 / C++ error thrown when the Worker is instantiated:

node:internal/worker:202
    this[kHandle] = new WorkerImpl(url,
                    ^
This Environment was initialized without a V8::Inspector
Thrown at:
    at Worker (node:internal/worker:202:21)
    at InternalWorker (node:internal/worker:446:5)
    at node:internal/modules/esm/public_loader_proxy:39:16
    at compileForInternalLoader (node:internal/bootstrap/loaders:331:7)
    at nativeModuleRequire (node:internal/bootstrap/loaders:362:14)
    at node:internal/process/esm_loader:49:45
    at compileForInternalLoader (node:internal/bootstrap/loaders:331:7)
    at nativeModuleRequire (node:internal/bootstrap/loaders:362:14)
    at node:internal/modules/cjs/loader:141:18
    at compileForInternalLoader (node:internal/bootstrap/loaders:331:7)

This error is also occurring in https://github.com/nodejs/node/pull/44732#issuecomment-1253914637.

JakobJingleheimer avatar Oct 23 '22 19:10 JakobJingleheimer

Adding a “request changes” here to see https://github.com/nodejs/node/issues/43658#issuecomment-1288170130 resolved.

It might be a little premature to drop a block: The loaders team (what're we called now?) also want the numbers you're concerned about, and I'm not gonna be the guy to tank node's performance 😉

Feel free to leave the block if you'd rather, but I fear leaving it might give passersby the wrong takeaway on the status of the PR.

JakobJingleheimer avatar Oct 23 '22 20:10 JakobJingleheimer

Now node --input-type=module -e 'console.log(1)' is creating the following error:

Error: [object Module] could not be cloned.
    at serialize (node:v8:371:7)
    at node:internal/modules/esm/worker:81:29
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

We would need a way to pass those to one thread to another.

aduh95 avatar Nov 06 '22 17:11 aduh95

@nodejs/v8 would anyone be able to provide some insight into the V8 error we're encountering described in https://github.com/nodejs/node/pull/44710#issuecomment-1288184538. The error is triggered in this branch by running anything that invokes ESM, such as ./node --trace-uncaught --test ./test/es-module/test-esm-loader-chaining.mjs

JakobJingleheimer avatar Nov 17 '22 23:11 JakobJingleheimer

I have a problem with merging this PR into Node 18. The problem is technical: Node v14 is still supported by Node, but it does not have the message passing capability that Node 16 and 18 has (no port parameter). See the api for getGlobalPreloadCode in Node 14 for proof.

I found this out when implementing the use of the message channel for communication between the main user threads and the loader thread. It worked well in 16 and 18, but obviously failed in 14.

Unfortunately, the change to implement two communication methods (the old one for Node 14 and the new one for Node 18) is a big hassle. So I will be a bit stuck between a rock and a hard place if this PR lands in v18. If it lands in 19/20, then I'm assuming we can start ignoring Node 14, and we'll be mostly OK.

giltayar avatar Nov 22 '22 07:11 giltayar

@JakobJingleheimer It is not a V8 error. It comes from our code. The throw was added in https://github.com/nodejs/node/pull/35025 (I'm not able to investigate more than this).

targos avatar Nov 22 '22 07:11 targos

I would recommend to not consider support for v14 and even v16 as a requirement. They are both going out of support in 2023.

mcollina avatar Nov 22 '22 08:11 mcollina

I would recommend to not consider support for v14 and even v16 as a requirement. They are both going out of support in 2023.

We can do that if we have to, but we would really like to ship a stable Loaders API as far back as possible, so that as many users as possible can migrate to it.

GeoffreyBooth avatar Nov 22 '22 17:11 GeoffreyBooth

I think it's a moot point: I think one of the key PRs for Loaders was not easily backported to v14 (and I didn't get to it), precluding a backport of this (and any other subsequent PRs).

JakobJingleheimer avatar Nov 22 '22 20:11 JakobJingleheimer

Not sure if I have enough time to dig into the error myself, but from a quick glance I think the error is caused by trying to access an inspector agent in the worker even when the worker is created with kNoCreateInspector somehow (which is odd, I think all workers are supposed to have this flag set to false by default). You can try setting a breakpoint in ThrowUninitializedInspectorError() and see where the error is thrown, and see if env->should_create_inspector() is the cause of that throw, if it is, try to trace back to where the kNoCreateInspector flag is coming from.

joyeecheung avatar Nov 23 '22 16:11 joyeecheung

(I also take it that you do want to be able to inspect the loader thread...so you do want to leave kNoCreateInspector flag to zero. Otherwise, you'd need to somehow set that flag to 1 internally when creating the loader worker, see Worker::New() on how the flags are modified from options passed from the JS land, and also make sure that all the places guarded by ThrowUninitializedInspectorError() are not hit from the loader worker).

joyeecheung avatar Nov 23 '22 16:11 joyeecheung

@joyeecheung thanks! I'll try to figure out this weekend how to do that.

Yes, the worker/thread would ideally be inspectable (but I wasn't trying to do that when I encountered the error).

JakobJingleheimer avatar Nov 23 '22 16:11 JakobJingleheimer

I just rebased on main. I was able to reproduce the issues cited above. To summarize:

./node --trace-uncaught --test ./test/es-module/test-esm-loader-chaining.mjs
TAP version 13
node:internal/test_runner/harness:37
      throw err;
      ^

Error [ERR_UNHANDLED_ERROR]: Unhandled error. ('This Environment was initialized without a V8::Inspector')
    at new NodeError (node:internal/errors:400:5)
    at InternalWorker.emit (node:events:508:17)
    at [kOnErrorMessage] (node:internal/worker:312:10)
    at [kOnMessage] (node:internal/worker:323:37)
    at MessagePort.<anonymous> (node:internal/worker:218:57)
    at [nodejs.internal.kHybridDispatch] (node:internal/event_target:734:20)
    at exports.emitMessage (node:internal/per_context/messageport:23:28) {
  code: 'ERR_UNHANDLED_ERROR',
  context: 'This Environment was initialized without a V8::Inspector'
}
./node --eval 'console.log(1)'
node:internal/modules/cjs/loader:1165
    throw err;
    ^

TypeError: Cannot read properties of undefined (reading 'set')
    at internalCompileFunction (node:internal/vm:102:17)
    at wrapSafe (node:internal/modules/cjs/loader:1141:20)
    at Module._compile (node:internal/modules/cjs/loader:1182:27)
    at runScript (node:internal/process/execution:82:27)
    at evalScript (node:internal/process/execution:104:10)
    at node:internal/main/eval_string:50:3
./node --input-type=module --eval 'console.log(1)'
^C # Kill hung process

GeoffreyBooth avatar Dec 04 '22 05:12 GeoffreyBooth

@JakobJingleheimer @nodejs/v8 @targos @MoLow I’ve made a small discovery that might point to the cause of the This Environment was initialized without a V8::Inspector error. So per above we were getting that error by running ./node --trace-uncaught --test ./test/es-module/test-esm-loader-chaining.mjs. If you make the following change to src/env-inl.h:

inline bool Environment::should_create_inspector() const {
-  return (flags_ & EnvironmentFlags::kNoCreateInspector) == 0 &&
-         !options_->test_runner && !options_->watch_mode;
+  return (flags_ & EnvironmentFlags::kNoCreateInspector) == 0;
}

Then the error disappears, and the process hangs like it does for ./node --input-type=module --eval 'console.log(1)'. An __error.log file is generated, containing this:

TypeError [ERR_INVALID_ARG_VALUE]: The argument 'filename' must be a file URL object, file URL string, or absolute path string. Received undefined
    at new NodeError (node:internal/errors:400:5)
    at createRequire (node:internal/modules/cjs/loader:1337:11)
    at file:///Users/geoffrey/Sites/node/test/common/index.mjs:3:17
    at ModuleJob.run (node:internal/modules/esm/module_job:194:25)

While this kNoCreateInspector flag was added in https://github.com/nodejs/node/pull/35025, the second line about !options_->test_runner && !options_->watch_mode was added in https://github.com/nodejs/node/pull/44366. So it appears that something about --test is causing this error. This might be a bug in general, not specific to this PR, but I don’t have time at the moment to try to reproduce it in a minimal context (cc @MoLow).

@JakobJingleheimer For the purposes of this PR I would just test it using a command other than --test, and hopefully the --test issue can be investigated separately.

GeoffreyBooth avatar Dec 04 '22 06:12 GeoffreyBooth

While this kNoCreateInspector flag was added in https://github.com/nodejs/node/pull/35025, the second line about !options_->test_runner && !options_->watch_mode was added in https://github.com/nodejs/node/pull/44366. So it appears that something about --test is causing this error. This might be a bug in general, not specific to this PR, but I don’t have time at the moment to try to reproduce it in a minimal context (cc @MoLow).

the objective of the change in https://github.com/nodejs/node/pull/44366 was to make --inspect (with all of its variations) only affect the "inner" process since both --test and --watch have a wrapping process with internal node code, --inspect should not open a debugger session on the external process.

we might want to only pass --trace-uncaught to the inner process and ignore it in the wrapper process

MoLow avatar Dec 04 '22 06:12 MoLow

./node --trace-uncaught ./test/es-module/test-esm-loader-chaining.mjs

is now passing, and errors from the worker are now surfaced (and the process terminates gracefully)

Something related to the --test trigers such an error

./node --trace-uncaught --test ./test/es-module/test-esm-loader-chaining.mjs

Also, I believe I've found the cause of the v8.serialize() failure for

./node --input-type="module" -e "console.log(1)"

ESMLoader::eval() returns a Module instance, and Module::wrapper is a Proxy, which is not serialisable. I'll dig into that (chatted with Geoffrey, and we have a couple avenues to try).

JakobJingleheimer avatar Dec 06 '22 23:12 JakobJingleheimer

the objective of the change in #44366 was to make --inspect (with all of its variations) only affect the “inner” process since both --test and --watch have a wrapping process with internal node code, --inspect should not open a debugger session on the external process.

we might want to only pass --trace-uncaught to the inner process and ignore it in the wrapper process

@MoLow it seems like this change breaks ./node --trace-uncaught --test ./test/es-module/test-esm-loader-chaining.mjs when worker threads are used, which happens as a result of the changes in this PR. I suspect that node --test some-code-that-spawns-workers.js is probably also broken. Do you have time to look into this? Perhaps there’s a different way to prevent the inspector from covering the “wrapper” process, or we can revert that change for now (to un-break using --test with workers) and add it back once a new method is determined?

GeoffreyBooth avatar Dec 06 '22 23:12 GeoffreyBooth

@GeoffreyBooth Il take a deep look at this on this weekend

MoLow avatar Dec 08 '22 07:12 MoLow

@GeoffreyBooth @JakobJingleheimer I tryed reproducing with

// worker.js
'use strict';

const { parentPort } = require('worker_threads');

parentPort.on('message', (msg) => {
  parentPort.postMessage(msg);
  setTimeout(() => { throw new Error('test'); }, 10);
});

import { describe, it } from 'node:test'
import { deepEqual, fail } from 'node:assert'
import { Worker } from 'node:worker_threads'



it('test', async () => {
  const worker = new Worker("./worker.js");
  worker.on('online', console.log);
  worker.postMessage('hello');
  await new Promise((resolve) => {
    worker.on('message',(...args) => {
      deepEqual(args, ['hello']);
      setTimeout(() => {
        worker.terminate();
        resolve();
      }, 100);
    });
  });
});

but this does not seem to reproduce any issues with --trace-uncaught, what am I missing?

MoLow avatar Dec 09 '22 07:12 MoLow