implement module.registerHooks() to run synchronous module customization hooks in thread
This implements part of the proposal in https://github.com/nodejs/loaders/pull/198 - for the motivations, see https://github.com/nodejs/node/issues/52219 - this fills in the gap where being able to run in-thread and support require() is more important for a hook than being able to run asynchronous code (even then, packages like https://www.npmjs.com/package/everysync allows user to sync-ify async code off-thread themselves), especially for a large amount of CJS loader moneky-patchers out there to migrate to an officially supported API easily.
The synchronous, in-thread hooks are also a lot easier to support universally in the loaders and reduce the number of dark corners where the hooks cannot run or behave in a surprising manner - I guess the amount of "this caveat of asynchronous hooks does not apply to the synchronous hooks" added in the documentation kind of also already proves the point...
For now only resolve and load are implemented to reach parity with module.register(). I left the exports hook to https://github.com/joyeecheung/node/tree/export-hooks which will be a follow-up when we decide the timing at which the post-load hook for CJS should run.
The commit module: use synchronous hooks for preparsing in import(cjs) isn't strictly necessary and could split into a follow-up instead, but I think it's nice to have it here instead of leaving out that corner uncovered.
Review requested:
- [ ] @nodejs/loaders
- [ ] @nodejs/startup
Added a lot of tests. This should now be ready for review.
CI: https://ci.nodejs.org/job/node-test-pull-request/63632/
CI: https://ci.nodejs.org/job/node-test-pull-request/63633/
Codecov Report
Attention: Patch coverage is 98.37278% with 11 lines in your changes missing coverage. Please review.
Project coverage is 88.54%. Comparing base (
dbfcbe3) to head (9529784). Report is 94 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #55698 +/- ##
==========================================
+ Coverage 88.50% 88.54% +0.03%
==========================================
Files 656 657 +1
Lines 189261 189858 +597
Branches 36346 36451 +105
==========================================
+ Hits 167508 168101 +593
- Misses 14969 14976 +7
+ Partials 6784 6781 -3
| Files with missing lines | Coverage Ξ | |
|---|---|---|
| lib/internal/modules/customization_hooks.js | 100.00% <100.00%> (ΓΈ) |
|
| lib/internal/modules/esm/module_job.js | 99.25% <100.00%> (+<0.01%) |
:arrow_up: |
| lib/internal/modules/helpers.js | 98.84% <100.00%> (-0.01%) |
:arrow_down: |
| lib/internal/modules/esm/translators.js | 92.93% <94.11%> (-0.17%) |
:arrow_down: |
| lib/internal/modules/esm/loader.js | 97.80% <94.44%> (-0.23%) |
:arrow_down: |
| lib/internal/modules/cjs/loader.js | 98.10% <97.24%> (-0.19%) |
:arrow_down: |
I do have one feature request I'd love to see here though - a one-way function to disable hooks for security module.lockHooks(), with no arguments, that disallows further register or registerHooks calls.
That seems out of scope of this PR? If it was fine not having it for module.register, why does it need to happen in this PR?
Having a hook lockdown phase is not a requirement for this PR by any means, but would be a nice to have to clarify the early hook registration model sooner rather than later.
Having a hook lockdown phase is not a requirement for this PR by any means, but would be a nice to have to clarify the early hook registration model sooner rather than later.
I feel that should be something that's up to the discretion of the hook author, there can surely be use cases where one wants to avoid inheritance in third-party child processes/workers, or just to avoid infinite recursion in case the code run in the loader forks itself, though we could mention it in the documentation for sure.
Right, so it's our job to educate hook authors how to write hooks to support multi threading (or not) and how to avoid recursive application etc. etc.
With regards to hook "lockdown" I think this is something that the application owner would care about - how do I know that an arbitrary npm package import isn't going to be doing hooks without me knowing about it? How do I know I'm not depending on a package that's automatically adding a "jsx" hook and changing the way my local code runs? A lockdown feature is for the application code owner's benefit. Ideally lockdown by default would be great too to ensure we don't enter a Node.js future where hooking the loader is common for library dependencies.
Right, so it's our job to educate hook authors how to write hooks to support multi threading (or not) and how to avoid recursive application etc. etc.
While I agree better documentation should exist, I think we shouldn't really dive into too much details in the API documentation. It should either be in the learn section in the website, or just have dedicated repository for tutorials + examples like https://github.com/nodejs/node-addon-examples and the new https://github.com/nodejs/package-examples that we are spinning (I also left comment in this PR that I think the example loaders should be moved in its own repo, and refrained from adding too many complementary examples for the new API, I also noticed that some existing examples in the documentation didn't even work, but it felt out of scope to fix them here and IMO they should really just be tested properly in a dedicated repo).
@joyeecheung to give an example, I was not aware --import applied for all threads and this came as a surprise to me personally. This information is valuable.
Yes, though not all valuable information belongs in the API documentation ;) One can argue that https://nodejs.github.io/node-addon-examples/ is full of valuable information, but it's better to have a dedicated tutorial on a complex topic like this, instead of trying to elaborate on it in the API documentation. I think we can make do with one or two sentence about --import for this PR, and leave detailed explanation as a TODO, and don't pile even more information that should be better hosted elsewhere into the API documentation. If you take a look at https://nodejs.org/api/module.html you probably will also notice how tiring and unapproachable it is to have all that information piled into one page, people might never notice all the scattered details and recommendations simply because it's not a great read and kinda odd to have a tutorial taking 1/2 of the page, with the rest just being straightforward API documentation or just full of pointers to other docs instead of examples.
I tried moving the docs of sync hooks first, but it turned out to be huge amount of work, because most of the docs shared for both starts with an example of the async one and the text & example now all have to be rewritten :( The doc overhaul is way too much work for a initial semver-minor PR that's just trying to add a variant that's less broken...(that, and also I don't think most of the original docs belong in the API documentation in the first place, so I feel like I am just wasting more effort on texts in a document where they don't belong).
I think I've addressed most of the comments so far, except the one about changing the documentation to make sync hooks go first, because I think that would lead to an overhaul of the documentation as most of the explanatory texts are talking about async hooks examples and this PR mostly just append something saying what's different in sync hooks and that is not a huge amount of changes. But actually talking about sync hooks first means all of the texts and examples need to be reworded otherwise it looks incorrect logically. Also this is just an API that's not yet released and I feel that we could do the reordering when we have it more battle tested.
Other than the nit about doc ordering I think this is good to go - incidentally in the past 24 hours I've already seem two glitches (1, 2) coming from incorrect CJS loader monkey patching after require(esm) is unflagged on v22. So I hope we can get it over with sooner than later and stop the spread of incorrect CJS loader monkey patching in the ecosystem with this new proper API.
PTAL again, thanks @legendecas, @JakobJingleheimer, @guybedford, @GeoffreyBooth, @jasnell
Rebased to resolve conflicts in module.md (this is the second time now, kinda prove my point about what backporting nightmare it can be if the doc has to be overhauled in this PR..)
Can you take a look again? Thanks @legendecas, @JakobJingleheimer, @guybedford, @GeoffreyBooth, @jasnell
CI: https://ci.nodejs.org/job/node-test-pull-request/63946/
CI: https://ci.nodejs.org/job/node-test-pull-request/63947/
CI: https://ci.nodejs.org/job/node-test-pull-request/63960/
CI: https://ci.nodejs.org/job/node-test-pull-request/63963/
CI: https://ci.nodejs.org/job/node-test-pull-request/63964/
Landed in 2960a5954047f44b517855e57fb3a79f0a201fd3...1215a8bf124603a906c7bf748dee4f7f22cc6dd7
The https://github.com/nodejs/node/labels/notable-change label has been added by @joyeecheung.
Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. Otherwise, the commit will be placed in the Other Notable Changes section.
Note to releasers: please use the OP text (modulo background) for summary in the change log. The graph can be left out I suppose.
This depends on at least two refactoring PRs that landed before: https://github.com/nodejs/node/pull/54769 and https://github.com/nodejs/node/pull/55590 .
This PR does not land cleanly on v22.x-staging and will need manual backport in case we want it in v22.x.
I think it's worth backporting to v22 (since the whole point is to get more people off CJS monkey patching), though I am not 100% sure the refactoring here plays well with er, existing CJS monkey patching (I tried my best to make it play, but one never knows how crazy the patching can be). So we should wait and see how it goes in 23 for a bit before backporting.
This needs to be backported together with https://github.com/nodejs/node/pull/56402, and https://github.com/nodejs/node/pull/56454
@joyeecheung hey, I see that require.resolve is calling Module._resolveFilename,
which does not have resolveWithHooks calling. Even the opposite, resolveForCJSWithHooks is calling Module._resolveFilename.
Which means sync hooks will not intercept require.resolve, although docs implies it should work.
It seems like resolveWithHooks logic should be implemented internally in Module._resolveFilename.