loaders
loaders copied to clipboard
Helpers design doc
I started writing a design doc for the helper functions we’re considering adding, to help people reimplement only small portions of hooks (see README). The idea is that if someone wants to change a small part of what happens within Node’s internal resolve, for example, rather than copy/pasting all the code from Node’s internal resolve and changing one thing (and then this loader getting outdated with the next version of Node where Node’s internal resolve changes) we provide lots of functions that make up all the steps of Node’s resolve, and the hook author can sting those together with whatever minor changes they want as part of the user resolve hook.
On a practical level this will mean making public many of the functions in https://github.com/nodejs/node/blob/3350d9610864af3219de7dd20e3ac18b3c214c52/lib/internal/modules/esm/resolve.js, and splitting some of the larger ones into smaller ones that become public. Because many functions reference other helper functions, I’m thinking of a pattern where most of these internal calls become references that get passed in. So for example within packageResolve there’s a call to parsePackageName; the latter can be passed in as an argument to packageResolve, if the user desires to override parsePackageName within packageResolve. Leaving it undefined would mean that Node’s default internal parsePackageName would be used.
I’m opening this PR as a draft to get initial feedback on this general approach. If you all think it’s good and the best way to address how to make these functions into public APIs, then I’ll keep going on this design doc and try to define all of them (at least for resolve, for now) and then we can implement this.
Does this design account for composing two loaders that are both trying to use these helpers?
Decorator pattern can work well for stuff like this. Decorator pattern could actually work for all hooks: these helpers, existing hooks, and the fs hooks.
// my-loader.mjs
export function create(hooks: NodeLoaderHooks) {
return {
...hooks,
resolve(url: Url, context) { /* resolve hook goes here. Can call `hooks.resolve()` if appropriate */ },
fileExists(path: string) { /* augment fileExists */ },
findPackageJson(packageName: string, base: string, isScoped: boolean) { /* ... */ }
// All other hook behaviors are unmodified by this loader. Other loaders might modify them
}
}
The snippet above has them all on a single namespace, but could also work splitting into multiple namespaces.
export function create(hooks) {
...hooks,
fs: {
...hooks.fs,
fileExists() { /* augment fileExists */ }
}
}
Does this design account for composing two loaders that are both trying to use these helpers?
The idea is that each helper would be a “pure” function, operating only on its input; hence the need for passing in a lot more input, from references to other functions to also containers of state like the module cache. But once every helper function is pure and stateless, they should be able to be used by any loader in a chain without conflict.
I'm thinking about a situation where one loader needs to augment the behavior of a helper function, and another loader needs to pass an implementation of that helper function into another helper function. The second loader needs a reference to the first loader's augmentations of the helper.
I'm thinking about a situation where one loader needs to augment the behavior of a helper function, and another loader needs to pass an implementation of that helper function into another helper function. The second loader needs a reference to the first loader's augmentations of the helper.
Presumably, with ambiant loaders (I'm working on them), one loader could just override the node:loader_utils built-in module to point to its custom implementation, right?
Presumably, with ambiant loaders (I’m working on them), one loader could just override the
node:loader_utilsbuilt-in module to point to its custom implementation, right?
This isn’t a use case I considered, like say if you want to override a helper like packageResolve and have your custom version be used by all following loaders; but I suppose it should be possible, since loaders can override/replace/mock builtin imports. Loaders probably wouldn’t and shouldn’t be able to override the version of packageResolve that Node itself uses within its internal resolve, whether or not that gets called, because we have lots of protections against userland code messing with Node’s internals. The point of creating these helpers isn’t for them to be overridden, it’s for them to be used within user-defined hooks.
I’m still hoping to avoid needing lots of defined hooks, like separate hooks for every filesystem operation etc. For now my goal is only to reduce boilerplate among hooks that users are writing, not provide even more ways to hook into Node’s internals (at least, not as part of this effort).
Getting back to my original question, does anyone have any feedback about this general design pattern?
To override node:loader_utils in an ambient loader, do you imagine it aliases the existing node:loader_utils -- possibly coming from another loader in the chain -- to a new specifier? And then emits its own code that imports from that new specifier?
Something like:
import * as wrappedHelpers from 'loader_utils_generated_alias_46873'; // avoid collisions
export function findPackageJson() { wrappedHelpers.findPackageJson(/* augmentations */) /* augmentations */ }
If helper A calls helper B, calls helper C, then helper C's dependencies should be pass-able to helper A. So that any helper can accept a Partial<Helpers> which will apply to all direct and transitive operations.
I dunno if there's any performance difference between a single createHelpers(helpers) factory / class and passing a helpers object into each function. Some of these might be hot codepaths. I've read node's stance on performance considerations.
To override
node:loader_utilsin an ambient loader
I was thinking that each helper would be available on module, as in import { packageResolve } from 'module'. So you could override them like any other import.
But please, let’s stop focusing on overriding helpers. The question is whether the API I’ve started to sketch out in this draft PR is good or could be improved.
It might help to get a usage example for invoking packageResolve with a custom implementation of tryStatSync, to make sure we're on the same page as far as intended usage in those situations.
It might help to get a usage example for invoking
packageResolvewith a custom implementation oftryStatSync, to make sure we’re on the same page as far as intended usage in those situations.
I’ve added an example of the intended/expected usage: https://github.com/nodejs/loaders/pull/94/commits/0d263e3b7c9ff83812aa02a96e99cbbf47d7e444.
I’ll add another example with a custom implementation of one of the child functions.
The situation I'm thinking of is: what's intended implementation when tryStatSync needs to be customized within a call to packageResolve? Knowing that packageResolve is going to call findPackageJson, which will call tryStatSync
The situation I’m thinking of is: what’s intended implementation when
tryStatSyncneeds to be customized within a call topackageResolve? Knowing thatpackageResolveis going to callfindPackageJson, which will calltryStatSync
At the moment I can’t think of a realistic example for why you’d want to customize tryStatSync, but assuming you had one, it would be something like this (within the example I linked to above):
const tryStatSync = () => true; // Or whatever
const pathToPackage = fileURLToPath(packageResolve(specifier, context.parentURL, context.conditions, {
tryStatSync,
}));
I'm thinking about virtual filesystem situations. Not sure if that's intended to be done in this way, or via the virtual filesystem hooks.
But I like that we got clarification on the ability to pass transitive dependencies into a helper. So a single overrides object can be passed into every helper invocation, ensuring a consistent set of overrides are used for each (transitive) operation.
Fwiw my thinking for virtual filesystems is that we would be able to override node:fs the same way as the hooks.
One note: perhaps it'd be simpler if the helpers too a fs reference rather than each individual function. This way users could just do:
import fs from 'node:fs';
packageResolve(..., {fs});
Rather than (which could get unwieldy if there was multiple FS IO slots):
import {tryStatSync} from 'node:fs';
packageResolve(..., {tryStatSync});
Fwiw my thinking for virtual filesystems is that we would be able to override
node:fsthe same way as the hooks.
I think the question here is whether you want to override all filesystem calls, including app code that uses fs, or only file IO that happens as a result of import and require. If the latter, then you should create your set of overrides for readFile, readFileSync and so on, and pass your new versions of those functions into the helpers like I have tryStat in this example. Then the virtual filesystem you’ve created would apply to module loading only, and an app could still access files on the real hard disk etc.
If you really want to override all file IO, you could just replace the fs module itself via a loader. When a resolve hook sees fs or node:fs or its variations, something other than the Node builtin gets returned instead. We wouldn’t need to create distinct hooks for all the filesystem methods, I don’t think (or at least, try to get it to work with the hooks we have before proposing to add more).
It's still my understanding that the thought experiment from https://github.com/nodejs/loaders/pull/89#issuecomment-1162114548 requires composing helper functions from two different loaders into one. This design by itself doesn't seem to fill that need. So for this design to make sense, I think there needs to be some understanding that an additional feature will fill the cross-loader composition requirement.
I think there needs to be some understanding that an additional feature will fill the cross-loader composition requirement.
Can you explain a bit what this requirement is? There’s a lot of stuff in that comment you’re linking to.
My goal with these helpers is just to reduce boilerplate. You could use them in a sorta-compositional way by having one loader override these helpers, so that when the next loader imports them they get the custom versions. So if the first loader uses resolve and load to return custom source whenever packageResolve is imported from 'module', this custom packageResolve would be what gets used by the next loader in the chain (once we get the “ambient loaders” idea working). Maybe this isn’t robust enough for your needs? I’m not sure what exactly you need, or what use case you’re trying to solve.
These helpers aren’t meant to solve all outstanding feature requests for loaders. It’s just one part of the roadmap. If there’s anything about creating these helpers that hinders any future use cases, or any way that the design of these helpers can be improved to better enable those use cases, please make specific suggestions on what to change.
The API needs more work, but overall these seem like positive directions.
One approach might also be to just start simple - I don't think anyone would reject exposing isBareSpecifier. Could well be an easy way to start.
With package boundary functions, there are some questions as to whether it should directly return configuration, or if you should use path-based APIs primarily - getPackageBoundary(url) -> boundary + getPackageConfig(boundary) versus getPackageConfig(url) -> { config, boundary } sort of thing. Minor stuff but some design decisions in play.
Note that packageResolve doesn't return the path to the package in node_modules/pkg/ but instead handles the full resolution of a package including exports resolution etc etc. That's why it needs conditions. A function to resolve the path to a package wouldn't need any conditions parameter.
Perhaps having packageResolve mean what you thought it meant would work better for a helper. But that design also needs to be incorporated into how you then handle resolution within the package. A separate resolvePackageExport(subpath, packageUrl, conditions) can work. Although the paths for own-name and imports are still left out of this design flow which might make it easy for users to forget these if they are trying to write their own routines. So this gets a little harder to decompose well in a way that doesn't just expose partial functionality - I think with these pieces we need to ensure we have all the pieces in the design before exposing any one.
The yarn team might have some good insight on the design, since they have similar functions in their yarn PnP API.
For example, referring to https://github.com/nodejs/loaders/pull/94#issuecomment-1193385401:
Note that
packageResolve... handles the full resolution ... That's why it needs conditions. A function to resolve the path to a package wouldn't need any conditions parameter.
The latter exists in yarn's PnP API, named resolveToUnqualified. It has a well-scoped responsibility: it resolves a specifier to a package's root path, but does not attempt sub-path resolution. (IIRC, hopefully @arcanis can correct me if I'm wrong)
https://github.com/nodejs/node/blob/a506aa76a8cfb6438a0b4bc7604623bc6518a7c9/lib/internal/modules/esm/resolve.js#L774-L806
This I think clearly matches the way yarn and ts-node need to interop.
- yarn is responsible for resolving
foo/barto/path/to/pnp/virtual/foo.zipor/monorepo/packages/foo.resolveToUnqualifieddoes that. ts-nodeis responsible for resolving from there to/monorepo/packages/foo/src/mod.tsbased on various rules for mapping between file extensions,outDir->rootDir, CommonJS support.
So when pnp and ts-node hooks are both active, the resolver should use:
- yarn's implementation of
resolveToUnqualified - ts-node's implementation of
legacyMainResolve
I acknowledge that I'm jumping ahead, because I'm talking about composing hooks from two loaders. But I think it's still relevant to the design of these helpers.
yarn is responsible for resolving foo/bar to /path/to/pnp/virtual/foo.zip or /monorepo/packages/foo. resolveToUnqualified does that.
In this case it would resolve to /path/to/pnp/virtual/foo.zip/bar
https://yarnpkg.com/advanced/pnpapi/#resolvetounqualified
Regarding augmenting functionality, could these helpers be implemented in ESM? If so, then couldn't loaders be used to customise their dependencies?
// exposed on node:module
import { whatever } from 'node:fs';
export default function packageResolve(…) { … }
// custom-loader.mjs
import * as td from 'testdouble';
export async function load(url, { parentURL }, nextLoad) {
await td.replaceEsm('node:fs', { whatever: fakeWhatever });
const { packageResolve } = await import('node:module');
// …
}
function fakeWhatever() { … }
Regarding augmenting functionality, could these helpers be implemented in ESM? If so, then couldn’t loaders be used to customise their dependencies?
At the moment, at least, all internal code must be CommonJS. As far as I understand it (and I’m sure this is a simplification), we’d have to rebuild how the internal code gets assembled to be passed into V8 at startup for it to become ESM, since it’s not like internal code uses either the CJS or ESM loaders to run.
Is exposing the current logic as pieced helpers the right approach? I see two use cases that relate to this:
- Customizing the way node loads packages: involves making arbitrary changes to node's resolve/load hooks. That's fine if it's completely new logic, but doing small tweaks of existing logic is very cumbersome and this seems like is the focus of OP
- Leveraging some node logic to get package information: some code (likely a framework) wants to find the path of a package that node would load (or the non-deferenced symlinked path[2], or its package.json location[1], or more generally all resolved export, or the actual extension, etc). Note this is not the same as
import.meta.resolve(or evenrequire.resolve)
I can see how helpers would be a perfect fit for (2), but just exposing the logic without allowing for overriding it would still cause a lot of rewriting and maintenance for use case (1), no? A few examples:
- A customer loader that exposes a synthetic endpoint (e.g. "./myframework-metadata") would only need to a different
packageExportsResolve, but has to re-write - almost verbatim - the functions leading to it:packageExportsResolve <- packageResolve <- moduleResolve <- defaultResolve - @cspotcode 's typescript loader needs to know how to resolve to ".ts"-like extensions which means customizing
finalizeResolutionandlegacyMainResolve, but in reality ts-node also has to basically copy the whole code changing the calls
Essentially, the deeper in the call stack the behavior trying to be customized is, he more replication there will be. A simple wrapper around the helpers would allow to change references to point to custom functions but it would be patching the internal loader and is hacky. As a compromise couldn't node expose the a DefaultLoader prototype it uses itself to instantiate a loader object used as the default loader (i.e. today's defaultResolve would call defaultLoader.resolve, and so forth)?
To a limited extent, this is what require does with the Module prototype with its setters and getters, but ESM could use actual functions and be more consistent defining everything in a prototype. A few things this would achieve:
- It would allow users to override specific methods of
DefaultLoaderdown the prototype chain to instantiate their own custom loaders - Users could still instantiate the
DefaultLoaderthemselves to use node's internal logic as helpers, or any other loader in their dependencies really - The actual
defaultResolveinstance used internally by node doesn't have to be exposed - The API surface would be basically the same as helpers, just packaged differently
I'd image this is achievable through adapters and shims without much change to the current public spec. But it could be taken further later with decomposed prototypes for resolve and load, loader chaining could the use object references as nextResolve and nextLoad, there could be a way to get the an instance of the current prototype being used that would generalize import.meta.resolve into just createLoader().resolve, documented list of "stable" methods
[1] https://github.com/nodejs/node/issues/33460 [2] https://github.com/nodejs/node/issues/3402
couldn’t node expose the a
DefaultLoaderprototype it uses itself to instantiate a loader object
No, because then the entire class becomes a public API surface and we could essentially never change any part of it without those changes being breaking. This is what happened with CommonJS monkey-patching and is why ESM is comparatively locked down.
Well, all the internal CJS resolver functions are still private, so that's not the best example.
Yeah, I don't think this is the same as require. CJS implementation allows for ad-hoc customization of certain behavior by using a Module prototype to get/set function references used in their private logic. I'm sure there were good reasons for it but with the benefit of hindsight, it doesn't look right.
I'd separate the concern into two parts:
- Exposing internal logic so users can use as helpers or selectively override on their own loaders: means that logic has to be backward compatible and thus harder to evolve and maintain
- Allowing change of current logic via monkey-patching: means we could have code stepping onto each other toes
The suggestion is to expose the prototype and not the instance used by node, and either can be frozen so (2) is not a problem. Customization would happen through prototype extension into a new class/prototype that would then be used in custom loaders. Everything is immutable.
As for (1), the API surface could be made to match whatever helpers were going to be created. Maintainers would choose what logic makes the "stable"/"public" cut, and the rest could be private. Granted, the more that makes to the prototype, the more useful it would be and if that's small enough there's no benefit. But it would still have the benefit of later be dialed up as see fit.
My guess is the biggest risk of having more rather than less logic as part of the prototype is for use cases of using it as helpers. If used to create custom loaders, the authors of those libraries are fairly aware they are working at a lower level, documentation could state the stability of each method (much like it does today for some features), and the library users would essentially have to enable a feature flag to use them (i.e. --loader).
The API surface for the helper use case and custom loader case doesn't even have to be the same. The custom loader prototype could be passed to a factory method that node calls to create custom loader, and only a subset of it - possibly even keeping with the named function exports idea, but still using the prototype instance behind the scenes - could be available for importing to use as helper. This would be quite a break tho.