node icon indicating copy to clipboard operation
node copied to clipboard

Proposal for a simple, universal module loader hooks API to replace require() monkey-patching

Open joyeecheung opened this issue 11 months ago • 47 comments

Spinning off from https://github.com/nodejs/node/pull/51977

Background

There has been wide-spread monkey-patching of the CJS loader in the ecosystem to customize the loading process of Node.js (e.g. utility packages that abstract over the patching and get depended on by other packages e.g. require-in-the-middle, pirates, or packages that do this on their own like tsx or ts-node). This includes but is not limited to patching Module.prototype._compile, Module._resolveFilename, Module.prototype.require, require.extensions etc. To avoid breaking them Node.js has to maintain the patchability of the CJS loader (even for the underscored methods on the prototype) and this leads to very convoluted code in the CJS loader and also spreads to the ESM loader. It also makes refactoring of the loaders for any readability or performance improvements difficult.

While the ecosystem can migrate to produce and run real ESM gradually, existing tools still have to maintain support for CJS output (either written as CJS, or transpiled from ESM) while that happens, and the maintenance of CJS loader is still important. If we just provide a universal API that works for both CJS and ESM loading customizations, tooling and their users can benefit from a more seamless migration path.

Why module.register() is not enough

The loader hooks (in the current form, module.register()) were created to address loading customization needs for the ESM loader, so they only work when the graph is handled by the ESM loader. For example it doesn't work when the require() comes from a CJS root (which could be a transpiled result), or from createRequire(import.meta.url). Addressing existing use cases in the CJS loader is not in the scope of the loaders effort, either (it was brought up before was dismissed in a previous PR too). Therefore tooling in the wild still have to maintain both monkey-patching-based hooks for CJS and loader hooks for ESM when they want to provide universal support. Their users either register both hooks, or (if they know what format is actually being run by Node.js) choose one of them.

The module.register() API currently forces the loader hooks to be run on a different worker thread even if the user hooks can do everything synchronously or need to mutate the context on the main thread. While this simplifies de-async-ing of loader code to some extent when they want to run asynchronous code in loader hooks, the value is lost once the loader needs to provide universal support for CJS graphs and has to maintain synchronous hooks for that too (and in that case, they could just spawn their own workers to de-async, e.g. what @babel/register does on top of require() monkey-patching).

For tooling that only has CJS support via require() monkey-patching, if they want to add ESM support, this unconditional worker abstraction as the only way to customize ESM loading makes wiring existing customizations into ESM more complicated that it needs to be. The move to unconditional workers also lead to many issues that are still unaddressed:

  • Forcing loaders to be registered on a different worker thread and having to block the main thread for it makes the hooks prone to accidental deadlocks
  • When users want to spawn worker or child processes, having no control over the de-async loader worker leads to usability issues like https://github.com/nodejs/node/issues/47747 and https://github.com/nodejs/node/issues/47615 and this just looks like a rabbit hole
  • Debugging a worker is harder and more confusing than just doing everything on the main thread: e.g. not being able to see console.log() in the hooks
  • It imposes an performance overhead from worker startup and messaging that users cannot opt out of, even if their code don't need to be run off-thread in anyway. They would also have to mutate the context on the main thread via messaging but not everything can cross the boundary.

For us maintainers, having to support this worker setup as the only way to customize module loading also adds maintenance burden to the already convoluted loaders. It is already difficult to get right in the ESM loader (e.g. having to doge infinite worker creation or having to figure out how to share the loader worker among user workers), let alone in the monkey-patchable CJS loader.

Proposal of a synchronous, in-thread, universal loader hooks API

As such I think we need something simpler than module.register() that:

  1. Allow users to customize module loading universally - both CJS and ESM, no matter how they are loaded.
  2. Easy to migrate to from existing require() monkey-patching-based hooks.
  3. Allow us to fully deprecate require() monkey-patching sooner.

This becomes more important now that we are on a path to support require(esm) and want to help the ecosystem migrate to ESM by providing a path with backwards-compatibility and best-effort interop, instead of providing features that does not work in the existing CJS loader or goes against existing CJS usage patterns, making it difficult for people to migrate.

I propose that we just add a synchronous hooks API that work in both the CJS and the ESM loader as a replacement for the monkey-patchability of require(). The API can be something like this - this is just a straw-person sketch combining existing module.register() APIs and APIs in npm packages like pirates and require-in-the-middle. The key is that we should keep it simple and just take synchronous methods directly, and apply them in-thread.:

// If users want to do something off-thread, they can spawn their own workers here.
// Or import { addHooks } from 'node:module'
require('module').addHooks({
  // Useful for transpiler support of new extensions, etc.
  resolve(specifier, context, nextResolve) { ... },
  // load() + resolve() should be enough to provide what pirates need
  load(specifier, context, nextLoad) {
    // If users want to do things off-thread, e.g. to generate
    // or fetch the source code asynchronously, they could run it
    // in their own worker and block on it using their own
    // Atomics.wait().
  },
  // Should be enough to provide what require-in-the-middle need
  export(specifier, exports, context, nextExport) { ... }
  // Other hooks that we find necessary to replace `require()` monkey-patching
});

The main difference between this and module.register() is that hooks added via module.register() are run on a different worker unconditionally, while module.addHooks() just keeps things simple and runs synchronous hooks synchronously in-thread. If users want to run asynchronous code in the synchronous hooks, they can spawn their own workers - this means technically they could just implement what module.register() offers on top of module.addHooks() themselves. So module.register() just serves as a convenience method for those who want to run the code off-thread and prefer to delegate the worker-atomics-wait handling to Node.js core.

In a graph involving real ESM, module.register() can work in conjunction to module.addHooks(), the hooks are applied in the same order that they are added in the main thread. In a pure CJS graph, module.register() continues to be unsupported, as what's has already been happening. Maybe someday someone would be interested in figuring out how to make module.register() work safely in the CJS loader, but I think the burden from the handling the unconditional workers is just not worth the effort, especially when users can and already do spawn their own workers more safely for this use case. IMO a simple alternative like module.addHooks() would be a more viable plan for universal module loading customization, and it gets us closer to deprecating require() monkey-patching sooner.

Migration plan

Tooling in the wild can maintain just one set of synchronous customizations, and handle the migration path by changing how these customizations are wired into Node.js:

  1. On older versions of Node.js that they still need to support, they continue to fall back to customization wiring via both paths: into CJS via require()-monkey patching, and into ESM via module.register(). This is unfortunately what they already do today if they want to provide universal module support.
  2. On newer versions of Node.js, they can wire the same set of customizations into both CJS and ESM via just one path: module.addHooks(). There is no longer need to maintain two wiring for universal support of CJS and ESM. And, if they didn't support real ESM before, they get to implement ESM support relatively simply by just migrating from require() monkey-patching to module.addHooks().
  3. When they drop support of older versions that lack module.addHooks(), they can remove dependency on require() monkey patching completely.

For us, the migration plan looks like this:

  1. Provide module.addHooks() as a replacement for require() monkey-patching, and make it wired into both require()/createRequire() from the CJS loader and import/require in imported CJS from the ESM loader
  2. Doc-deprecate require() monkey patching and actively encourage user-land packages that rely on patching to migrate. At the mean time require() monkey patching will still work to some extent in conjunction with the new loader hooks, so that packages have a graceful migration period.
  3. When the dependencies on require() monkey patching drop enough in the ecosystem, start emitting runtime warnings when the internal properties of Module are patched, and suggesting to use module.addHooks().
  4. After that we will no longer maintain compatibility hacks for require() monkey patching to work. Packages who monkey-patch Module but don't manage to migrate might still work with newer versions of Node.js - until we do internal changes to the internal properties that they rely on. When we do that and break them, instead of further convoluting internals to make patching work, we'll suggest them to just use module.addHooks() on newer versions of Node.js.

joyeecheung avatar Mar 26 '24 10:03 joyeecheung