loaders Replace globalPreload

trafficstars

This hook is extremely confusing, even for maintainers.

The replacement needs to address:

cross-thread communication
determining what the entry-point is (since it's stripped from process.argv) as identified by https://github.com/nodejs/help/issues/4190 (eg an ESM equivalent of require.main)

Assuming there is 1 loader worker.

Spit-balling ideas:

expose a message port on node:module
return a message port from register()

Will a message port need to be refreshed?

@giltayar,I would be very interested in your thoughts (or even a design proposal) for this, especially because you recently handled this in testdouble, so the pain is fresh 😅

Jun 20 '23 14:06 JakobJingleheimer

The most intuitive way for me is that the register returns a port that is used to communicate with this loader (how is this done today? It can't be the same port for all the loaders, and yet they all run in the same worker). In this option, we would have a function in the loader (initialize) which will accept the port to be used to communicate.

Another variant of this option is for the register option to accept a MessageChannel (instead of creating it and returning it), which the user of register can use to communicate with the loader. The same initialize function can be used (we can even call globalPreload with the port for backward compatibility! 😁). The nice thing about this option is that if the user doesn't pass the MessageChannel to the register, then it signals that there is no need to call initialize and no need for MessagePort support.

In addition to passing a MessageChannel, the register should allow passing any data whatsoever to the initialize, e.g. the userland code can now pass process.argv to the loader to satisfy the requirement in https://github.com/nodejs/help/issues/4190. And actually, the user of register can just create a MessageChannel, pass the port2 via this data (it is a transferable object) and that's it: no need for any Node.js code for ports.

To summarize the above and define what is for me the best option: register can pass any data it wants to the loader, which will be passed to the exported initialize function of the loader. Additionally, if the user of register wants to communicate with the loader, it can just create a MessageChannel and pass the port to the loader as data.

Jun 21 '23 05:06 giltayar

I like that idea! I made a quick PoC here: https://github.com/targos/node/commit/c4f5e91ca4d4219e13f6f18f14b6101a26f96be5

$ ./node --no-warnings --loader "data:text/javascript," poc.mjs
initialize loader { entryPoint: '/Users/mzasso/git/nodejs/node/poc.mjs' }
loader initialized

Jun 21 '23 15:06 targos

that'd be amazing! PR?

Jun 21 '23 15:06 mcollina

Sweeeet!

Lemme get the existing register PR stable (I think there are 1–2 small issues with it remaining), and then let's get that in 🎉

PS haven't had time to thoroughly review Gil's ideas, but I liked where they're going.

Jun 21 '23 16:06 JakobJingleheimer

One thing to consider is that the point of the communications port isn’t just to pass data at initialization time. There are plenty of use cases where loaders need to communicate back and forth to the main thread after initialization.

Now maybe all this means is that we should rename initialize to something like onMessage, if it runs on every message, or initialize could return a function that runs on every message. But we should handle this use case in the design somehow.

Edit: I just looked at the POC and I see the intent, that you’re passing in the port and can register callbacks to it independently from us needing to provide infrastructure for them. I think it can work.

Before converting this into a full-fledged PR, can someone update the docs in the POC so that I can see what the intended UX is supposed to be? I think it would be good to workshop that before someone goes and implements it. What I think is the proposed UX seems fine to me at first glance, but having it spelled out in the instructions we would write for loader authors would help me fully understand it.

Jun 21 '23 16:06 GeoffreyBooth

@targos @GeoffreyBooth how is this going? I think this is blocking a few use cases to fully support Node.js v20.

Happy to help in shipping this.

Jul 10 '23 09:07 mcollina

@targos @GeoffreyBooth how is this going? I think this is blocking a few use cases to fully support Node.js v20.

Happy to help in shipping this.

I just got back from vacation.

Before I left, there appeared to be 1 issue in https://github.com/nodejs/node/pull/48439 related to sequence that I didn't have time to resolve. Picking back up this week (you're welcome to take a gander at it too).

PS It looks like a duplicate PR attempting to do the same thing was started whilst I was away. I haven't had a chance yet to look at it.

Jul 10 '23 09:07 JakobJingleheimer

PS It looks like a duplicate PR attempting to do the same thing was started whilst I was away. I haven't had a chance yet to look at it.

https://github.com/nodejs/node/pull/48559

Jul 10 '23 11:07 GeoffreyBooth

@GeoffreyBooth @JakobJingleheimer I updated my PR to address this issue 😄 I followed the proposal from @giltayar and the signature for register is updated to be able to pass arbitrary data to a loader's initialize hook, including two additional tests demonstrating the capability. This change exists in its own commit in order to be able to be reviewed separately.

Note, however, that I opted to add an additional specific transferList parameter instead of attempting to automatically deeply inspect the sent data, so the signature for register is now:

register(loader: string, parentUrl?: string, data?: any, transferList?: any[]): unknown;

The return value from the initialize hook is sent back, but there is currently no affordance for this to also provide a transferList and thus ports or other shared buffers cannot currently be returned from initialize.

I did not remove globalPreload as I am not yet comfortable enough knowing which bits and pieces are safe to pull out or if a deprecation notice is preferred over simply deleting the whole thing.

Jul 10 '23 23:07 izaakschroeder

This change exists in its own commit in order to be able to be reviewed separately.

Thank you so much! Do you mind opening this as its own PR? And it can include/depend on the earlier one. And the earlier PR can exclude the new commit so that it can be reviewed on its own.

Jul 11 '23 00:07 GeoffreyBooth

Done!

PR is here: https://github.com/izaakschroeder/node/pull/1 I'm not sure how I can put the PR on node's repo without the base branch also living in node's repo 🤔 Would you prefer I just target node:main instead and include the previous commit? Or?

Jul 11 '23 00:07 izaakschroeder

I’m not sure how I can put the PR on node’s repo

So your first PR is based on your support-nested-loader-chains branch. I suggest that you:

Branch off of that branch in your repo, something like support-nested-loader-chains-and-register-args, so that the two branches both point to the same commit.
Rebase the support-nested-loader-chains branch to drop the newest, register-related commit.
Force-push support-nested-loader-chains to your repo. This will reset the PR to exclude the newest commit.
Open a new PR with support-nested-loader-chains-and-register-args against node:main and mention in the PR description that it builds off https://github.com/nodejs/node/pull/48559.

That’s it. This is the general procedure for splitting a large PR into smaller ones.

Jul 11 '23 03:07 GeoffreyBooth

Note, however, that I opted to add an additional specific transferList parameter instead of attempting to automatically deeply inspect the sent data, so the signature for register is now:

Why do we have both data and transferList as separate arguments? What’s the difference between them?

@izaakschroeder did you see/use @targos’ POC in https://github.com/nodejs/loaders/issues/147#issuecomment-1601085150?

Jul 11 '23 03:07 GeoffreyBooth

I'm having trouble tracking down the future of the loader hooks. So I'm not sure the right place for this feedback. Please redirect me if this is not correct.

I work at New Relic on the Node.js agent team and I'm working through an update to our loader to support ESM instrumentation. I'm finding the isolating loader to their own thread is causing a few headaches. If globalPreload is getting removed how will I be able to initialize something in the loader that is in the main thread? Is this with the new initialize or register hooks?

Also if you take a look at what import-in-the-middle had to do to solve their instrumentation needs, is this really what you want to push onto vendors like us?(New Relic, Datadog, Elastic, etc)? In pre 20 we used to be able to wrap the ESM source and proxy props now we have to parse the source with an AST parser to get export names and proxy this way? I don't know if I'm too late to the party but as someone that's not directly involved in this, the idea of running loaders in separate threads is causing more headaches for vendors instead of an easy to use API to do it. I'd rather go back to monkey patching a built in like Module and getting access to packages like this. I worry that this will soon be stable and the hook points available look great on paper but when you see what has to be done within, it is creating a different problem and risk.

Jul 27 '23 14:07 bizob2828

I'm having trouble tracking down the future of the loader hooks. So I'm not sure the right place for this feedback. Please redirect me if this is not correct.

Hi @bizob2828 and thanks for your feedback. Yes this is a good place.

If globalPreload is getting removed how will I be able to initialize something in the loader that is in the main thread?

It’s the other way around: the main thread will initialize the loader. Instead of your users running --loader new-relic/loader or whatever they do for ESM now, they would instead run --import new-relic/register. And within your register module would be code that calls module.register() to register your loader hooks and get a communications port to send messages between the main and loader threads; and then do whatever other initialization you want to do on the main thread. This approach provides easy access to the main thread, which is why we’re going this way (and may phase out --loader, as it’s not really needed anymore).

if you take a look at what import-in-the-middle had to do to solve their instrumentation needs, is this really what you want to push onto vendors like us?

Please read the whole thread at https://github.com/nodejs/node/issues/47888, this was discussed in detail. The short version is that we’re not trying to make your jobs difficult, but we’re constrained by both the ESM spec and by V8’s capabilities. However we can make things easier within our power, that doesn’t break something else, we’re happy to do—or are happy to code review PRs for. As you might have noticed, there is a very small group working on these features and we have very limited time, so you might need to volunteer some of your own time to implement some of your requests.

I’m finding the isolating loader to their own thread is causing a few headaches.

I understand, but it’s necessary for the loader hooks to eventually support CommonJS, as the only way to “syncify” our hooks to allow customizing the sync require calls is by using a separate thread. Ditto for supporting the sync import.meta.resolve API. There were many other reasons as well; the short version is that this was years in the making and it’s not some minor change that we’re open to reverting. I understand it’s annoying from your perspective, but it’s necessary to achieve all the goals we’re aiming to satisfy with the Loaders API. We’re only two PRs away from getting to the end of our road map for what we need to land in order to call the API stable; so pretty soon the churn should end and you’ll be able to rely on this without so much effort.

Jul 27 '23 17:07 GeoffreyBooth

In pre 20 we used to be able to wrap the ESM source and proxy props now we have to parse the source with an AST parser to get export names and proxy this way?

This is a different but related request to my ask that we get hook access into node's CommonJS static analysis to generate named exports from CJS modules. It was somewhere on that list of missing hooking APIs I shared a while back, wherever that went.

I found one place that I shared it: https://github.com/nodejs/node/issues/43818#issuecomment-1184415188 I feel like it was captured into markdown somewhere, just can't remember where.

Jul 27 '23 18:07 cspotcode

This is a different but related request to my ask that we get hook access into node’s CommonJS static analysis to generate named exports from CJS modules. It was somewhere on that list of missing hooking APIs I shared a while back, wherever that went.

That list got migrated into the Milestone 3 list: https://github.com/nodejs/loaders#milestone-3-usability-improvements. The CommonJS static analysis is on there, the one about cjs-module-lexer.

FYI while it’s not currently available from within Node, like import lexer from 'node:cjs-module-lexer' or something like that, it is available as a package that you can add as a dependency to your projects: https://github.com/nodejs/cjs-module-lexer.

Jul 27 '23 20:07 GeoffreyBooth

Hello Loaders team. Will this eventually end up in all version lines, or will old versions still use globalPreload? Trying to decide whether I should keep the preload logic in my loader, or plan to just drop it completely when register shows up. My idea now is to target LTS, v16+.

Jul 31 '23 19:07 bojavou

The entire Loaders API is experimental, so there’s no reason that it wouldn’t get backported to all lines. That would probably be the ideal outcome, as then developers like you wouldn’t need to maintain support for variations of APIs.

In practice it’ll probably need to be stable on 20 for a while before it gets backported to 18, possibly for months or even a year, so if you’re eager to support 18 and below before then you might need to implement two versions.

Jul 31 '23 19:07 GeoffreyBooth

v16.x reaching EOL in one month, odds are virtually 0 that it will be backported there.

Jul 31 '23 22:07 aduh95

I think we should keep this issue on topic (globalPreload) and not talk about the backporting of the new loader to v18 (might be worthwhile to do that in another issue)

Aug 01 '23 08:08 mcollina

Question:

I'm responsible for the ESM implementation of module mocking in testdouble. In that implementation we have a file testdouble.js that implements the mocking API by communicating with the loader.

What I'd like to do, so that users won't need to --import testdouble-loader.js, is to register the loader in testdouble.js. The question (or is it a thinly-veiled ask? 😁) is this: how can I know whether I already registered the loader or not?

You could say that given that the registration happens when the testdouble.js module is loaded, and that doesn't happen twice, then calling register shouldn't happen and there's no need to check this.

True! But what happens when testdouble.js is loaded twice because it is used in two workers?

Hmm... this is actually an important question: if I have two workers that want to communicate with the loader. Is calling register allowed once per worker? And should the loader store an array of ports that it uses to communicate with all the workers that register-ed it? I believe it should.

(Sorry, a bit confused. But I think my train of thought here is important to understand the needs of API-s that need to communicate with loaders)

Aug 03 '23 04:08 giltayar

The question (or is it a thinly-veiled ask? 😁) is this: how can I know whether I already registered the loader or not?

As far as what node will tell you, there's no straightforward way to determine that (ex you can't do registered.has(myLoader)). I can think of a couple heinous ways to figure it out. Where do you need to know–in import's module, in your loader, or from the app/entry-point?

True! But what happens when testdouble.js is loaded twice because it is used in two workers?

Do you mean node's ESM worker? I think there may currently be unintended/undesired behaviour making that possible, but we'll squash that.

if I have two workers that want to communicate with the loader.

What are these workers / where did they come from? Are they spawned by the entry point?

Aug 21 '23 20:08 JakobJingleheimer

@giltayar maybe? I can help… I built much of the new register machinery so perhaps I could try to build out an example that fits here?

`testdouble-super-register.js`

import {MessageChannel, Worker, isMainThread, workerData} from 'node:worker_threads';
import {register} from 'node:module';

const TestDoubleLock = Symbol();

// > True! But what happens when testdouble.js is loaded twice because it is used in two workers?
// This will prevent testdouble.js from executing `register` multiple times from other workers
// Node _should_ generally only execute a single module once but I people could get likely
// around that if they tried. You could add an additional guard on `globalThis` which is shared
// between every module on a single V8 context (~thread). 
if (isMainThread && !globalThis[TestDoubleLock]) {
  globalThis[TestDoubleLock] = true;

  const controlChannel = new MessageChannel();

  const result = register('testdouble-super-loader.js', {
    parentURL: import.meta.url,
    data: {port: controlChannel.port2},
    transferList: [controlChannel.port2], 
  });

  if (result.error) {
    throw new Error('Somehow registered testdouble twice!');
  }

  const spawnWorker = () => {
    const workerChannel = new MessageChannel(); 
    // > if I have _two_ workers that want to communicate with the loader.
    // You can pass ports around – one that goes to the loader and one to the worker.
    new Worker(import.meta.url, {
       workerData: {port: workerChannel.port1},
       transferList: [workerChannel.port1],
    });
    controlChannel.port1.postMessage(
      {type: 'NEW_WORKER', port: workerChannel.port2}, 
      [workerChannel.port2]
    );
  };
  
  // Do what you need to do.
  spawnWorker();
  spawnWorker();
  spawnWorker();
} else {
  workerData.port.postMessage({
    type: 'THIS_GOES_TO_LOADER'
  });
}

`testdouble-super-loader.js`

const handleMessage = (msg) => {
  switch(msg.type) {
  case 'NEW_WORKER':
    msg.port.on('message', handleMessage);
    break;
  case 'THIS_GOES_TO_LOADER':
    // YOUR LOGIC HERE FROM YOUR WORKERS
    break;
  }
}

// > The question (or is it a thinly-veiled ask? 😁) is this: how can I 
// > know whether I already registered the loader or not?
// You _CAN_ know this here; `initialize` is called each time the 
// loader is registered.
//
// *IMPORTANT*: There is currently no way for this value to be
// mucked up, but if/when `deregister` lands to remove a loader
// then we may need a `deinitialize` hook or similar to keep the
// value here correct.

let lock = 0;

const initialize = (data) => {
  if (lock > 0) {
    // Loader has already been registered!
    return {error: true};
  }
  lock++;
  data.port.on('message', handleMessage);
  return {error: false};
}

Aug 21 '23 20:08 izaakschroeder

For what it's worth I'm also not against exposing a getLoaders() type method on node:module. I don't think it would be hard to implement and we could have it store the results of initialize across the thread bridge.

register(originalSpecifier, parentURL, data, transferList) {
  const id = ''; // TBD: Determine how to pull this out.
  this.loaders[id] = hooksProxy.makeSyncRequest('register', transferList, originalSpecifier, parentURL, data);
  return this.loaders[id];
}

deregister(id) {
  const result = hooksProxy.makeSyncRequest('deregister', undefined, id);
  delete this.loaders[id];
  return result;
}

getLoaders() {
  return Object.values(this.loaders);
}

Aug 21 '23 21:08 izaakschroeder

Would we want to expose the whole registry or would just name and position do the job? Would someone want to use the loader or just know about it?

Aug 21 '23 21:08 JakobJingleheimer

Would we want to expose the whole registry or would just name and position do the job? Would someone want to use the loader or just know about it?

I'm… not sure… I guess we would probably want to define an interface for what it returns… I assume at least the id… but maybe more.

Aug 21 '23 22:08 izaakschroeder

has() would be safer than getLoaders(), since programs might be relying on the function they pass in not being reachable by userland code (same reason browsers don't have a "get all event listeners" function)

Aug 21 '23 22:08 ljharb

I don't want to create APIs without a clear use case. If loader authors can solve this problem themselves with a trivial amount of code, I'd leave it at that until it becomes clear that the existing solution is insufficient somehow.

Aug 21 '23 23:08 GeoffreyBooth

loaders loaders copied to clipboard

Replace globalPreload

testdouble-super-register.js

testdouble-super-loader.js

loaders
loaders copied to clipboard

`testdouble-super-register.js`

`testdouble-super-loader.js`