loaders
loaders copied to clipboard
Replace globalPreload
This hook is extremely confusing, even for maintainers.
The replacement needs to address:
- cross-thread communication
- determining what the entry-point is (since it's stripped from
process.argv) as identified by https://github.com/nodejs/help/issues/4190 (eg an ESM equivalent ofrequire.main)
Assuming there is 1 loader worker.
Spit-balling ideas:
- expose a message port on
node:module - return a message port from
register()
Will a message port need to be refreshed?
@giltayar,I would be very interested in your thoughts (or even a design proposal) for this, especially because you recently handled this in testdouble, so the pain is fresh 😅
The most intuitive way for me is that the register returns a port that is used to communicate with this loader (how is this done today? It can't be the same port for all the loaders, and yet they all run in the same worker). In this option, we would have a function in the loader (initialize) which will accept the port to be used to communicate.
Another variant of this option is for the register option to accept a MessageChannel (instead of creating it and returning it), which the user of register can use to communicate with the loader. The same initialize function can be used (we can even call globalPreload with the port for backward compatibility! 😁). The nice thing about this option is that if the user doesn't pass the MessageChannel to the register, then it signals that there is no need to call initialize and no need for MessagePort support.
In addition to passing a MessageChannel, the register should allow passing any data whatsoever to the initialize, e.g. the userland code can now pass process.argv to the loader to satisfy the requirement in https://github.com/nodejs/help/issues/4190. And actually, the user of register can just create a MessageChannel, pass the port2 via this data (it is a transferable object) and that's it: no need for any Node.js code for ports.
To summarize the above and define what is for me the best option: register can pass any data it wants to the loader, which will be passed to the exported initialize function of the loader. Additionally, if the user of register wants to communicate with the loader, it can just create a MessageChannel and pass the port to the loader as data.
I like that idea! I made a quick PoC here: https://github.com/targos/node/commit/c4f5e91ca4d4219e13f6f18f14b6101a26f96be5
$ ./node --no-warnings --loader "data:text/javascript," poc.mjs
initialize loader { entryPoint: '/Users/mzasso/git/nodejs/node/poc.mjs' }
loader initialized
that'd be amazing! PR?
Sweeeet!
Lemme get the existing register PR stable (I think there are 1–2 small issues with it remaining), and then let's get that in 🎉
PS haven't had time to thoroughly review Gil's ideas, but I liked where they're going.
One thing to consider is that the point of the communications port isn’t just to pass data at initialization time. There are plenty of use cases where loaders need to communicate back and forth to the main thread after initialization.
Now maybe all this means is that we should rename initialize to something like onMessage, if it runs on every message, or initialize could return a function that runs on every message. But we should handle this use case in the design somehow.
Edit: I just looked at the POC and I see the intent, that you’re passing in the port and can register callbacks to it independently from us needing to provide infrastructure for them. I think it can work.
Before converting this into a full-fledged PR, can someone update the docs in the POC so that I can see what the intended UX is supposed to be? I think it would be good to workshop that before someone goes and implements it. What I think is the proposed UX seems fine to me at first glance, but having it spelled out in the instructions we would write for loader authors would help me fully understand it.
@targos @GeoffreyBooth how is this going? I think this is blocking a few use cases to fully support Node.js v20.
Happy to help in shipping this.
@targos @GeoffreyBooth how is this going? I think this is blocking a few use cases to fully support Node.js v20.
Happy to help in shipping this.
I just got back from vacation.
Before I left, there appeared to be 1 issue in https://github.com/nodejs/node/pull/48439 related to sequence that I didn't have time to resolve. Picking back up this week (you're welcome to take a gander at it too).
PS It looks like a duplicate PR attempting to do the same thing was started whilst I was away. I haven't had a chance yet to look at it.
PS It looks like a duplicate PR attempting to do the same thing was started whilst I was away. I haven't had a chance yet to look at it.
https://github.com/nodejs/node/pull/48559
@GeoffreyBooth @JakobJingleheimer I updated my PR to address this issue 😄 I followed the proposal from @giltayar and the signature for register is updated to be able to pass arbitrary data to a loader's initialize hook, including two additional tests demonstrating the capability. This change exists in its own commit in order to be able to be reviewed separately.
Note, however, that I opted to add an additional specific transferList parameter instead of attempting to automatically deeply inspect the sent data, so the signature for register is now:
register(loader: string, parentUrl?: string, data?: any, transferList?: any[]): unknown;
The return value from the initialize hook is sent back, but there is currently no affordance for this to also provide a transferList and thus ports or other shared buffers cannot currently be returned from initialize.
I did not remove globalPreload as I am not yet comfortable enough knowing which bits and pieces are safe to pull out or if a deprecation notice is preferred over simply deleting the whole thing.
This change exists in its own commit in order to be able to be reviewed separately.
Thank you so much! Do you mind opening this as its own PR? And it can include/depend on the earlier one. And the earlier PR can exclude the new commit so that it can be reviewed on its own.
Done!
PR is here: https://github.com/izaakschroeder/node/pull/1 I'm not sure how I can put the PR on node's repo without the base branch also living in node's repo 🤔 Would you prefer I just target node:main instead and include the previous commit? Or?
I’m not sure how I can put the PR on node’s repo
So your first PR is based on your support-nested-loader-chains branch. I suggest that you:
- Branch off of that branch in your repo, something like
support-nested-loader-chains-and-register-args, so that the two branches both point to the same commit. - Rebase the
support-nested-loader-chainsbranch to drop the newest,register-related commit. - Force-push
support-nested-loader-chainsto your repo. This will reset the PR to exclude the newest commit. - Open a new PR with
support-nested-loader-chains-and-register-argsagainstnode:mainand mention in the PR description that it builds off https://github.com/nodejs/node/pull/48559.
That’s it. This is the general procedure for splitting a large PR into smaller ones.
Note, however, that I opted to add an additional specific
transferListparameter instead of attempting to automatically deeply inspect the sent data, so the signature forregisteris now:
Why do we have both data and transferList as separate arguments? What’s the difference between them?
@izaakschroeder did you see/use @targos’ POC in https://github.com/nodejs/loaders/issues/147#issuecomment-1601085150?
I'm having trouble tracking down the future of the loader hooks. So I'm not sure the right place for this feedback. Please redirect me if this is not correct.
I work at New Relic on the Node.js agent team and I'm working through an update to our loader to support ESM instrumentation. I'm finding the isolating loader to their own thread is causing a few headaches. If globalPreload is getting removed how will I be able to initialize something in the loader that is in the main thread? Is this with the new initialize or register hooks?
Also if you take a look at what import-in-the-middle had to do to solve their instrumentation needs, is this really what you want to push onto vendors like us?(New Relic, Datadog, Elastic, etc)? In pre 20 we used to be able to wrap the ESM source and proxy props now we have to parse the source with an AST parser to get export names and proxy this way? I don't know if I'm too late to the party but as someone that's not directly involved in this, the idea of running loaders in separate threads is causing more headaches for vendors instead of an easy to use API to do it. I'd rather go back to monkey patching a built in like Module and getting access to packages like this. I worry that this will soon be stable and the hook points available look great on paper but when you see what has to be done within, it is creating a different problem and risk.
I'm having trouble tracking down the future of the loader hooks. So I'm not sure the right place for this feedback. Please redirect me if this is not correct.
Hi @bizob2828 and thanks for your feedback. Yes this is a good place.
If
globalPreloadis getting removed how will I be able to initialize something in the loader that is in the main thread?
It’s the other way around: the main thread will initialize the loader. Instead of your users running --loader new-relic/loader or whatever they do for ESM now, they would instead run --import new-relic/register. And within your register module would be code that calls module.register() to register your loader hooks and get a communications port to send messages between the main and loader threads; and then do whatever other initialization you want to do on the main thread. This approach provides easy access to the main thread, which is why we’re going this way (and may phase out --loader, as it’s not really needed anymore).
if you take a look at what import-in-the-middle had to do to solve their instrumentation needs, is this really what you want to push onto vendors like us?
Please read the whole thread at https://github.com/nodejs/node/issues/47888, this was discussed in detail. The short version is that we’re not trying to make your jobs difficult, but we’re constrained by both the ESM spec and by V8’s capabilities. However we can make things easier within our power, that doesn’t break something else, we’re happy to do—or are happy to code review PRs for. As you might have noticed, there is a very small group working on these features and we have very limited time, so you might need to volunteer some of your own time to implement some of your requests.
I’m finding the isolating loader to their own thread is causing a few headaches.
I understand, but it’s necessary for the loader hooks to eventually support CommonJS, as the only way to “syncify” our hooks to allow customizing the sync require calls is by using a separate thread. Ditto for supporting the sync import.meta.resolve API. There were many other reasons as well; the short version is that this was years in the making and it’s not some minor change that we’re open to reverting. I understand it’s annoying from your perspective, but it’s necessary to achieve all the goals we’re aiming to satisfy with the Loaders API. We’re only two PRs away from getting to the end of our road map for what we need to land in order to call the API stable; so pretty soon the churn should end and you’ll be able to rely on this without so much effort.
In pre 20 we used to be able to wrap the ESM source and proxy props now we have to parse the source with an AST parser to get export names and proxy this way?
This is a different but related request to my ask that we get hook access into node's CommonJS static analysis to generate named exports from CJS modules. It was somewhere on that list of missing hooking APIs I shared a while back, wherever that went.
I found one place that I shared it: https://github.com/nodejs/node/issues/43818#issuecomment-1184415188 I feel like it was captured into markdown somewhere, just can't remember where.
This is a different but related request to my ask that we get hook access into node’s CommonJS static analysis to generate named exports from CJS modules. It was somewhere on that list of missing hooking APIs I shared a while back, wherever that went.
That list got migrated into the Milestone 3 list: https://github.com/nodejs/loaders#milestone-3-usability-improvements. The CommonJS static analysis is on there, the one about cjs-module-lexer.
FYI while it’s not currently available from within Node, like import lexer from 'node:cjs-module-lexer' or something like that, it is available as a package that you can add as a dependency to your projects: https://github.com/nodejs/cjs-module-lexer.
Hello Loaders team. Will this eventually end up in all version lines, or will old versions still use globalPreload? Trying to decide whether I should keep the preload logic in my loader, or plan to just drop it completely when register shows up. My idea now is to target LTS, v16+.
The entire Loaders API is experimental, so there’s no reason that it wouldn’t get backported to all lines. That would probably be the ideal outcome, as then developers like you wouldn’t need to maintain support for variations of APIs.
In practice it’ll probably need to be stable on 20 for a while before it gets backported to 18, possibly for months or even a year, so if you’re eager to support 18 and below before then you might need to implement two versions.
v16.x reaching EOL in one month, odds are virtually 0 that it will be backported there.
I think we should keep this issue on topic (globalPreload) and not talk about the backporting of the new loader to v18 (might be worthwhile to do that in another issue)
Question:
I'm responsible for the ESM implementation of module mocking in testdouble. In that implementation we have a file testdouble.js that implements the mocking API by communicating with the loader.
What I'd like to do, so that users won't need to --import testdouble-loader.js, is to register the loader in testdouble.js. The question (or is it a thinly-veiled ask? 😁) is this: how can I know whether I already registered the loader or not?
You could say that given that the registration happens when the testdouble.js module is loaded, and that doesn't happen twice, then calling register shouldn't happen and there's no need to check this.
True! But what happens when testdouble.js is loaded twice because it is used in two workers?
Hmm... this is actually an important question: if I have two workers that want to communicate with the loader. Is calling register allowed once per worker? And should the loader store an array of ports that it uses to communicate with all the workers that register-ed it? I believe it should.
(Sorry, a bit confused. But I think my train of thought here is important to understand the needs of API-s that need to communicate with loaders)
The question (or is it a thinly-veiled ask? 😁) is this: how can I know whether I already registered the loader or not?
As far as what node will tell you, there's no straightforward way to determine that (ex you can't do registered.has(myLoader)). I can think of a couple heinous ways to figure it out. Where do you need to know–in import's module, in your loader, or from the app/entry-point?
True! But what happens when
testdouble.jsis loaded twice because it is used in two workers?
Do you mean node's ESM worker? I think there may currently be unintended/undesired behaviour making that possible, but we'll squash that.
if I have two workers that want to communicate with the loader.
What are these workers / where did they come from? Are they spawned by the entry point?
@giltayar maybe? I can help… I built much of the new register machinery so perhaps I could try to build out an example that fits here?
testdouble-super-register.js
import {MessageChannel, Worker, isMainThread, workerData} from 'node:worker_threads';
import {register} from 'node:module';
const TestDoubleLock = Symbol();
// > True! But what happens when testdouble.js is loaded twice because it is used in two workers?
// This will prevent testdouble.js from executing `register` multiple times from other workers
// Node _should_ generally only execute a single module once but I people could get likely
// around that if they tried. You could add an additional guard on `globalThis` which is shared
// between every module on a single V8 context (~thread).
if (isMainThread && !globalThis[TestDoubleLock]) {
globalThis[TestDoubleLock] = true;
const controlChannel = new MessageChannel();
const result = register('testdouble-super-loader.js', {
parentURL: import.meta.url,
data: {port: controlChannel.port2},
transferList: [controlChannel.port2],
});
if (result.error) {
throw new Error('Somehow registered testdouble twice!');
}
const spawnWorker = () => {
const workerChannel = new MessageChannel();
// > if I have _two_ workers that want to communicate with the loader.
// You can pass ports around – one that goes to the loader and one to the worker.
new Worker(import.meta.url, {
workerData: {port: workerChannel.port1},
transferList: [workerChannel.port1],
});
controlChannel.port1.postMessage(
{type: 'NEW_WORKER', port: workerChannel.port2},
[workerChannel.port2]
);
};
// Do what you need to do.
spawnWorker();
spawnWorker();
spawnWorker();
} else {
workerData.port.postMessage({
type: 'THIS_GOES_TO_LOADER'
});
}
testdouble-super-loader.js
const handleMessage = (msg) => {
switch(msg.type) {
case 'NEW_WORKER':
msg.port.on('message', handleMessage);
break;
case 'THIS_GOES_TO_LOADER':
// YOUR LOGIC HERE FROM YOUR WORKERS
break;
}
}
// > The question (or is it a thinly-veiled ask? 😁) is this: how can I
// > know whether I already registered the loader or not?
// You _CAN_ know this here; `initialize` is called each time the
// loader is registered.
//
// *IMPORTANT*: There is currently no way for this value to be
// mucked up, but if/when `deregister` lands to remove a loader
// then we may need a `deinitialize` hook or similar to keep the
// value here correct.
let lock = 0;
const initialize = (data) => {
if (lock > 0) {
// Loader has already been registered!
return {error: true};
}
lock++;
data.port.on('message', handleMessage);
return {error: false};
}
For what it's worth I'm also not against exposing a getLoaders() type method on node:module. I don't think it would be hard to implement and we could have it store the results of initialize across the thread bridge.
register(originalSpecifier, parentURL, data, transferList) {
const id = ''; // TBD: Determine how to pull this out.
this.loaders[id] = hooksProxy.makeSyncRequest('register', transferList, originalSpecifier, parentURL, data);
return this.loaders[id];
}
deregister(id) {
const result = hooksProxy.makeSyncRequest('deregister', undefined, id);
delete this.loaders[id];
return result;
}
getLoaders() {
return Object.values(this.loaders);
}
Would we want to expose the whole registry or would just name and position do the job? Would someone want to use the loader or just know about it?
Would we want to expose the whole registry or would just name and position do the job? Would someone want to use the loader or just know about it?
I'm… not sure… I guess we would probably want to define an interface for what it returns… I assume at least the id… but maybe more.
has() would be safer than getLoaders(), since programs might be relying on the function they pass in not being reachable by userland code (same reason browsers don't have a "get all event listeners" function)
I don't want to create APIs without a clear use case. If loader authors can solve this problem themselves with a trivial amount of code, I'd leave it at that until it becomes clear that the existing solution is insufficient somehow.