loaders
loaders copied to clipboard
Helpers design doc
I started writing a design doc for the helper functions we’re considering adding, to help people reimplement only small portions of hooks (see README). The idea is that if someone wants to change a small part of what happens within Node’s internal resolve
, for example, rather than copy/pasting all the code from Node’s internal resolve
and changing one thing (and then this loader getting outdated with the next version of Node where Node’s internal resolve
changes) we provide lots of functions that make up all the steps of Node’s resolve
, and the hook author can sting those together with whatever minor changes they want as part of the user resolve
hook.
On a practical level this will mean making public many of the functions in https://github.com/nodejs/node/blob/3350d9610864af3219de7dd20e3ac18b3c214c52/lib/internal/modules/esm/resolve.js, and splitting some of the larger ones into smaller ones that become public. Because many functions reference other helper functions, I’m thinking of a pattern where most of these internal calls become references that get passed in. So for example within packageResolve
there’s a call to parsePackageName
; the latter can be passed in as an argument to packageResolve
, if the user desires to override parsePackageName
within packageResolve
. Leaving it undefined would mean that Node’s default internal parsePackageName
would be used.
I’m opening this PR as a draft to get initial feedback on this general approach. If you all think it’s good and the best way to address how to make these functions into public APIs, then I’ll keep going on this design doc and try to define all of them (at least for resolve
, for now) and then we can implement this.
Does this design account for composing two loaders that are both trying to use these helpers?
Decorator pattern can work well for stuff like this. Decorator pattern could actually work for all hooks: these helpers, existing hooks, and the fs hooks.
// my-loader.mjs
export function create(hooks: NodeLoaderHooks) {
return {
...hooks,
resolve(url: Url, context) { /* resolve hook goes here. Can call `hooks.resolve()` if appropriate */ },
fileExists(path: string) { /* augment fileExists */ },
findPackageJson(packageName: string, base: string, isScoped: boolean) { /* ... */ }
// All other hook behaviors are unmodified by this loader. Other loaders might modify them
}
}
The snippet above has them all on a single namespace, but could also work splitting into multiple namespaces.
export function create(hooks) {
...hooks,
fs: {
...hooks.fs,
fileExists() { /* augment fileExists */ }
}
}
Does this design account for composing two loaders that are both trying to use these helpers?
The idea is that each helper would be a “pure” function, operating only on its input; hence the need for passing in a lot more input, from references to other functions to also containers of state like the module cache. But once every helper function is pure and stateless, they should be able to be used by any loader in a chain without conflict.
I'm thinking about a situation where one loader needs to augment the behavior of a helper function, and another loader needs to pass an implementation of that helper function into another helper function. The second loader needs a reference to the first loader's augmentations of the helper.
I'm thinking about a situation where one loader needs to augment the behavior of a helper function, and another loader needs to pass an implementation of that helper function into another helper function. The second loader needs a reference to the first loader's augmentations of the helper.
Presumably, with ambiant loaders (I'm working on them), one loader could just override the node:loader_utils
built-in module to point to its custom implementation, right?
Presumably, with ambiant loaders (I’m working on them), one loader could just override the
node:loader_utils
built-in module to point to its custom implementation, right?
This isn’t a use case I considered, like say if you want to override a helper like packageResolve
and have your custom version be used by all following loaders; but I suppose it should be possible, since loaders can override/replace/mock builtin imports. Loaders probably wouldn’t and shouldn’t be able to override the version of packageResolve
that Node itself uses within its internal resolve
, whether or not that gets called, because we have lots of protections against userland code messing with Node’s internals. The point of creating these helpers isn’t for them to be overridden, it’s for them to be used within user-defined hooks.
I’m still hoping to avoid needing lots of defined hooks, like separate hooks for every filesystem operation etc. For now my goal is only to reduce boilerplate among hooks that users are writing, not provide even more ways to hook into Node’s internals (at least, not as part of this effort).
Getting back to my original question, does anyone have any feedback about this general design pattern?
To override node:loader_utils
in an ambient loader, do you imagine it aliases the existing node:loader_utils
-- possibly coming from another loader in the chain -- to a new specifier? And then emits its own code that imports from that new specifier?
Something like:
import * as wrappedHelpers from 'loader_utils_generated_alias_46873'; // avoid collisions
export function findPackageJson() { wrappedHelpers.findPackageJson(/* augmentations */) /* augmentations */ }
If helper A calls helper B, calls helper C, then helper C's dependencies should be pass-able to helper A. So that any helper can accept a Partial<Helpers>
which will apply to all direct and transitive operations.
I dunno if there's any performance difference between a single createHelpers(helpers)
factory / class and passing a helpers object into each function. Some of these might be hot codepaths. I've read node's stance on performance considerations.
To override
node:loader_utils
in an ambient loader
I was thinking that each helper would be available on module
, as in import { packageResolve } from 'module'
. So you could override them like any other import.
But please, let’s stop focusing on overriding helpers. The question is whether the API I’ve started to sketch out in this draft PR is good or could be improved.
It might help to get a usage example for invoking packageResolve
with a custom implementation of tryStatSync
, to make sure we're on the same page as far as intended usage in those situations.
It might help to get a usage example for invoking
packageResolve
with a custom implementation oftryStatSync
, to make sure we’re on the same page as far as intended usage in those situations.
I’ve added an example of the intended/expected usage: https://github.com/nodejs/loaders/pull/94/commits/0d263e3b7c9ff83812aa02a96e99cbbf47d7e444.
I’ll add another example with a custom implementation of one of the child functions.
The situation I'm thinking of is: what's intended implementation when tryStatSync
needs to be customized within a call to packageResolve
? Knowing that packageResolve
is going to call findPackageJson
, which will call tryStatSync
The situation I’m thinking of is: what’s intended implementation when
tryStatSync
needs to be customized within a call topackageResolve
? Knowing thatpackageResolve
is going to callfindPackageJson
, which will calltryStatSync
At the moment I can’t think of a realistic example for why you’d want to customize tryStatSync
, but assuming you had one, it would be something like this (within the example I linked to above):
const tryStatSync = () => true; // Or whatever
const pathToPackage = fileURLToPath(packageResolve(specifier, context.parentURL, context.conditions, {
tryStatSync,
}));
I'm thinking about virtual filesystem situations. Not sure if that's intended to be done in this way, or via the virtual filesystem hooks.
But I like that we got clarification on the ability to pass transitive dependencies into a helper. So a single overrides
object can be passed into every helper invocation, ensuring a consistent set of overrides are used for each (transitive) operation.
Fwiw my thinking for virtual filesystems is that we would be able to override node:fs
the same way as the hooks.
One note: perhaps it'd be simpler if the helpers too a fs
reference rather than each individual function. This way users could just do:
import fs from 'node:fs';
packageResolve(..., {fs});
Rather than (which could get unwieldy if there was multiple FS IO slots):
import {tryStatSync} from 'node:fs';
packageResolve(..., {tryStatSync});
Fwiw my thinking for virtual filesystems is that we would be able to override
node:fs
the same way as the hooks.
I think the question here is whether you want to override all filesystem calls, including app code that uses fs
, or only file IO that happens as a result of import
and require
. If the latter, then you should create your set of overrides for readFile
, readFileSync
and so on, and pass your new versions of those functions into the helpers like I have tryStat
in this example. Then the virtual filesystem you’ve created would apply to module loading only, and an app could still access files on the real hard disk etc.
If you really want to override all file IO, you could just replace the fs
module itself via a loader. When a resolve
hook sees fs
or node:fs
or its variations, something other than the Node builtin gets returned instead. We wouldn’t need to create distinct hooks for all the filesystem methods, I don’t think (or at least, try to get it to work with the hooks we have before proposing to add more).
It's still my understanding that the thought experiment from https://github.com/nodejs/loaders/pull/89#issuecomment-1162114548 requires composing helper functions from two different loaders into one. This design by itself doesn't seem to fill that need. So for this design to make sense, I think there needs to be some understanding that an additional feature will fill the cross-loader composition requirement.
I think there needs to be some understanding that an additional feature will fill the cross-loader composition requirement.
Can you explain a bit what this requirement is? There’s a lot of stuff in that comment you’re linking to.
My goal with these helpers is just to reduce boilerplate. You could use them in a sorta-compositional way by having one loader override these helpers, so that when the next loader imports them they get the custom versions. So if the first loader uses resolve
and load
to return custom source whenever packageResolve
is imported from 'module'
, this custom packageResolve
would be what gets used by the next loader in the chain (once we get the “ambient loaders” idea working). Maybe this isn’t robust enough for your needs? I’m not sure what exactly you need, or what use case you’re trying to solve.
These helpers aren’t meant to solve all outstanding feature requests for loaders. It’s just one part of the roadmap. If there’s anything about creating these helpers that hinders any future use cases, or any way that the design of these helpers can be improved to better enable those use cases, please make specific suggestions on what to change.
The API needs more work, but overall these seem like positive directions.
One approach might also be to just start simple - I don't think anyone would reject exposing isBareSpecifier
. Could well be an easy way to start.
With package boundary functions, there are some questions as to whether it should directly return configuration, or if you should use path-based APIs primarily - getPackageBoundary(url) -> boundary
+ getPackageConfig(boundary)
versus getPackageConfig(url) -> { config, boundary }
sort of thing. Minor stuff but some design decisions in play.
Note that packageResolve
doesn't return the path to the package in node_modules/pkg/
but instead handles the full resolution of a package including exports resolution etc etc. That's why it needs conditions. A function to resolve the path to a package wouldn't need any conditions parameter.
Perhaps having packageResolve
mean what you thought it meant would work better for a helper. But that design also needs to be incorporated into how you then handle resolution within the package. A separate resolvePackageExport(subpath, packageUrl, conditions)
can work. Although the paths for own-name and imports are still left out of this design flow which might make it easy for users to forget these if they are trying to write their own routines. So this gets a little harder to decompose well in a way that doesn't just expose partial functionality - I think with these pieces we need to ensure we have all the pieces in the design before exposing any one.
The yarn team might have some good insight on the design, since they have similar functions in their yarn PnP API.
For example, referring to https://github.com/nodejs/loaders/pull/94#issuecomment-1193385401:
Note that
packageResolve
... handles the full resolution ... That's why it needs conditions. A function to resolve the path to a package wouldn't need any conditions parameter.
The latter exists in yarn's PnP API, named resolveToUnqualified
. It has a well-scoped responsibility: it resolves a specifier to a package's root path, but does not attempt sub-path resolution. (IIRC, hopefully @arcanis can correct me if I'm wrong)
https://github.com/nodejs/node/blob/a506aa76a8cfb6438a0b4bc7604623bc6518a7c9/lib/internal/modules/esm/resolve.js#L774-L806
This I think clearly matches the way yarn and ts-node need to interop.
- yarn is responsible for resolving
foo/bar
to/path/to/pnp/virtual/foo.zip
or/monorepo/packages/foo
.resolveToUnqualified
does that. -
ts-node
is responsible for resolving from there to/monorepo/packages/foo/src/mod.ts
based on various rules for mapping between file extensions,outDir
->rootDir
, CommonJS support.
So when pnp and ts-node hooks are both active, the resolver should use:
- yarn's implementation of
resolveToUnqualified
- ts-node's implementation of
legacyMainResolve
I acknowledge that I'm jumping ahead, because I'm talking about composing hooks from two loaders. But I think it's still relevant to the design of these helpers.
yarn is responsible for resolving foo/bar to /path/to/pnp/virtual/foo.zip or /monorepo/packages/foo. resolveToUnqualified does that.
In this case it would resolve to /path/to/pnp/virtual/foo.zip/bar
https://yarnpkg.com/advanced/pnpapi/#resolvetounqualified