loaders Helpers design doc

Helpers design doc

Open GeoffreyBooth opened this issue 1 year ago • 25 comments

I started writing a design doc for the helper functions we’re considering adding, to help people reimplement only small portions of hooks (see README). The idea is that if someone wants to change a small part of what happens within Node’s internal resolve, for example, rather than copy/pasting all the code from Node’s internal resolve and changing one thing (and then this loader getting outdated with the next version of Node where Node’s internal resolve changes) we provide lots of functions that make up all the steps of Node’s resolve, and the hook author can sting those together with whatever minor changes they want as part of the user resolve hook.

On a practical level this will mean making public many of the functions in https://github.com/nodejs/node/blob/3350d9610864af3219de7dd20e3ac18b3c214c52/lib/internal/modules/esm/resolve.js, and splitting some of the larger ones into smaller ones that become public. Because many functions reference other helper functions, I’m thinking of a pattern where most of these internal calls become references that get passed in. So for example within packageResolve there’s a call to parsePackageName; the latter can be passed in as an argument to packageResolve, if the user desires to override parsePackageName within packageResolve. Leaving it undefined would mean that Node’s default internal parsePackageName would be used.

I’m opening this PR as a draft to get initial feedback on this general approach. If you all think it’s good and the best way to address how to make these functions into public APIs, then I’ll keep going on this design doc and try to define all of them (at least for resolve, for now) and then we can implement this.

Jul 10 '22 05:07 GeoffreyBooth

Does this design account for composing two loaders that are both trying to use these helpers?

Decorator pattern can work well for stuff like this. Decorator pattern could actually work for all hooks: these helpers, existing hooks, and the fs hooks.

// my-loader.mjs
export function create(hooks: NodeLoaderHooks) {
    return {
        ...hooks,
        resolve(url: Url, context) { /* resolve hook goes here.  Can call `hooks.resolve()` if appropriate */ },
        fileExists(path: string) { /* augment fileExists */ },
        findPackageJson(packageName: string, base: string, isScoped: boolean) { /* ... */ }
        // All other hook behaviors are unmodified by this loader.  Other loaders might modify them
    }
}

The snippet above has them all on a single namespace, but could also work splitting into multiple namespaces.

export function create(hooks) {
    ...hooks,
    fs: {
        ...hooks.fs,
        fileExists() { /* augment fileExists */ }
    }
}

Jul 10 '22 14:07 cspotcode

Does this design account for composing two loaders that are both trying to use these helpers?

The idea is that each helper would be a “pure” function, operating only on its input; hence the need for passing in a lot more input, from references to other functions to also containers of state like the module cache. But once every helper function is pure and stateless, they should be able to be used by any loader in a chain without conflict.

Jul 10 '22 15:07 GeoffreyBooth

I'm thinking about a situation where one loader needs to augment the behavior of a helper function, and another loader needs to pass an implementation of that helper function into another helper function. The second loader needs a reference to the first loader's augmentations of the helper.

Jul 10 '22 15:07 cspotcode

I'm thinking about a situation where one loader needs to augment the behavior of a helper function, and another loader needs to pass an implementation of that helper function into another helper function. The second loader needs a reference to the first loader's augmentations of the helper.

Presumably, with ambiant loaders (I'm working on them), one loader could just override the node:loader_utils built-in module to point to its custom implementation, right?

Jul 10 '22 15:07 arcanis

Presumably, with ambiant loaders (I’m working on them), one loader could just override the node:loader_utils built-in module to point to its custom implementation, right?

This isn’t a use case I considered, like say if you want to override a helper like packageResolve and have your custom version be used by all following loaders; but I suppose it should be possible, since loaders can override/replace/mock builtin imports. Loaders probably wouldn’t and shouldn’t be able to override the version of packageResolve that Node itself uses within its internal resolve, whether or not that gets called, because we have lots of protections against userland code messing with Node’s internals. The point of creating these helpers isn’t for them to be overridden, it’s for them to be used within user-defined hooks.

I’m still hoping to avoid needing lots of defined hooks, like separate hooks for every filesystem operation etc. For now my goal is only to reduce boilerplate among hooks that users are writing, not provide even more ways to hook into Node’s internals (at least, not as part of this effort).

Getting back to my original question, does anyone have any feedback about this general design pattern?

Jul 11 '22 03:07 GeoffreyBooth

To override node:loader_utils in an ambient loader, do you imagine it aliases the existing node:loader_utils -- possibly coming from another loader in the chain -- to a new specifier? And then emits its own code that imports from that new specifier?

Something like:

import * as wrappedHelpers from 'loader_utils_generated_alias_46873'; // avoid collisions
export function findPackageJson() { wrappedHelpers.findPackageJson(/* augmentations */) /* augmentations */ }

If helper A calls helper B, calls helper C, then helper C's dependencies should be pass-able to helper A. So that any helper can accept a Partial<Helpers> which will apply to all direct and transitive operations.

I dunno if there's any performance difference between a single createHelpers(helpers) factory / class and passing a helpers object into each function. Some of these might be hot codepaths. I've read node's stance on performance considerations.

Jul 11 '22 14:07 cspotcode

To override node:loader_utils in an ambient loader

I was thinking that each helper would be available on module, as in import { packageResolve } from 'module'. So you could override them like any other import.

But please, let’s stop focusing on overriding helpers. The question is whether the API I’ve started to sketch out in this draft PR is good or could be improved.

Jul 11 '22 16:07 GeoffreyBooth

It might help to get a usage example for invoking packageResolve with a custom implementation of tryStatSync, to make sure we're on the same page as far as intended usage in those situations.

Jul 11 '22 22:07 cspotcode

It might help to get a usage example for invoking packageResolve with a custom implementation of tryStatSync, to make sure we’re on the same page as far as intended usage in those situations.

I’ve added an example of the intended/expected usage: https://github.com/nodejs/loaders/pull/94/commits/0d263e3b7c9ff83812aa02a96e99cbbf47d7e444.

I’ll add another example with a custom implementation of one of the child functions.

Jul 12 '22 03:07 GeoffreyBooth

The situation I'm thinking of is: what's intended implementation when tryStatSync needs to be customized within a call to packageResolve? Knowing that packageResolve is going to call findPackageJson, which will call tryStatSync

Jul 12 '22 04:07 cspotcode

The situation I’m thinking of is: what’s intended implementation when tryStatSync needs to be customized within a call to packageResolve? Knowing that packageResolve is going to call findPackageJson, which will call tryStatSync

At the moment I can’t think of a realistic example for why you’d want to customize tryStatSync, but assuming you had one, it would be something like this (within the example I linked to above):

  const tryStatSync = () => true; // Or whatever
  const pathToPackage = fileURLToPath(packageResolve(specifier, context.parentURL, context.conditions, {
    tryStatSync,
  }));

Jul 12 '22 05:07 GeoffreyBooth

I'm thinking about virtual filesystem situations. Not sure if that's intended to be done in this way, or via the virtual filesystem hooks.

But I like that we got clarification on the ability to pass transitive dependencies into a helper. So a single overrides object can be passed into every helper invocation, ensuring a consistent set of overrides are used for each (transitive) operation.

Jul 12 '22 14:07 cspotcode

Fwiw my thinking for virtual filesystems is that we would be able to override node:fs the same way as the hooks.

One note: perhaps it'd be simpler if the helpers too a fs reference rather than each individual function. This way users could just do:

import fs from 'node:fs';

packageResolve(..., {fs});

Rather than (which could get unwieldy if there was multiple FS IO slots):

import {tryStatSync} from 'node:fs';

packageResolve(..., {tryStatSync});

Jul 12 '22 14:07 arcanis

Fwiw my thinking for virtual filesystems is that we would be able to override node:fs the same way as the hooks.

I think the question here is whether you want to override all filesystem calls, including app code that uses fs, or only file IO that happens as a result of import and require. If the latter, then you should create your set of overrides for readFile, readFileSync and so on, and pass your new versions of those functions into the helpers like I have tryStat in this example. Then the virtual filesystem you’ve created would apply to module loading only, and an app could still access files on the real hard disk etc.

If you really want to override all file IO, you could just replace the fs module itself via a loader. When a resolve hook sees fs or node:fs or its variations, something other than the Node builtin gets returned instead. We wouldn’t need to create distinct hooks for all the filesystem methods, I don’t think (or at least, try to get it to work with the hooks we have before proposing to add more).

Jul 13 '22 04:07 GeoffreyBooth

It's still my understanding that the thought experiment from https://github.com/nodejs/loaders/pull/89#issuecomment-1162114548 requires composing helper functions from two different loaders into one. This design by itself doesn't seem to fill that need. So for this design to make sense, I think there needs to be some understanding that an additional feature will fill the cross-loader composition requirement.

Jul 13 '22 11:07 cspotcode

I think there needs to be some understanding that an additional feature will fill the cross-loader composition requirement.

Can you explain a bit what this requirement is? There’s a lot of stuff in that comment you’re linking to.

My goal with these helpers is just to reduce boilerplate. You could use them in a sorta-compositional way by having one loader override these helpers, so that when the next loader imports them they get the custom versions. So if the first loader uses resolve and load to return custom source whenever packageResolve is imported from 'module', this custom packageResolve would be what gets used by the next loader in the chain (once we get the “ambient loaders” idea working). Maybe this isn’t robust enough for your needs? I’m not sure what exactly you need, or what use case you’re trying to solve.

These helpers aren’t meant to solve all outstanding feature requests for loaders. It’s just one part of the roadmap. If there’s anything about creating these helpers that hinders any future use cases, or any way that the design of these helpers can be improved to better enable those use cases, please make specific suggestions on what to change.

Jul 14 '22 20:07 GeoffreyBooth

The API needs more work, but overall these seem like positive directions.

One approach might also be to just start simple - I don't think anyone would reject exposing isBareSpecifier. Could well be an easy way to start.

With package boundary functions, there are some questions as to whether it should directly return configuration, or if you should use path-based APIs primarily - getPackageBoundary(url) -> boundary + getPackageConfig(boundary) versus getPackageConfig(url) -> { config, boundary } sort of thing. Minor stuff but some design decisions in play.

Note that packageResolve doesn't return the path to the package in node_modules/pkg/ but instead handles the full resolution of a package including exports resolution etc etc. That's why it needs conditions. A function to resolve the path to a package wouldn't need any conditions parameter.

Perhaps having packageResolve mean what you thought it meant would work better for a helper. But that design also needs to be incorporated into how you then handle resolution within the package. A separate resolvePackageExport(subpath, packageUrl, conditions) can work. Although the paths for own-name and imports are still left out of this design flow which might make it easy for users to forget these if they are trying to write their own routines. So this gets a little harder to decompose well in a way that doesn't just expose partial functionality - I think with these pieces we need to ensure we have all the pieces in the design before exposing any one.

Jul 24 '22 20:07 guybedford

The yarn team might have some good insight on the design, since they have similar functions in their yarn PnP API.

For example, referring to https://github.com/nodejs/loaders/pull/94#issuecomment-1193385401:

Note that packageResolve ... handles the full resolution ... That's why it needs conditions. A function to resolve the path to a package wouldn't need any conditions parameter.

The latter exists in yarn's PnP API, named resolveToUnqualified. It has a well-scoped responsibility: it resolves a specifier to a package's root path, but does not attempt sub-path resolution. (IIRC, hopefully @arcanis can correct me if I'm wrong) https://github.com/nodejs/node/blob/a506aa76a8cfb6438a0b4bc7604623bc6518a7c9/lib/internal/modules/esm/resolve.js#L774-L806

This I think clearly matches the way yarn and ts-node need to interop.

yarn is responsible for resolving foo/bar to /path/to/pnp/virtual/foo.zip or /monorepo/packages/foo. resolveToUnqualified does that.
ts-node is responsible for resolving from there to /monorepo/packages/foo/src/mod.ts based on various rules for mapping between file extensions, outDir -> rootDir, CommonJS support.

So when pnp and ts-node hooks are both active, the resolver should use:

yarn's implementation of resolveToUnqualified
ts-node's implementation of legacyMainResolve

I acknowledge that I'm jumping ahead, because I'm talking about composing hooks from two loaders. But I think it's still relevant to the design of these helpers.

Jul 26 '22 14:07 cspotcode

yarn is responsible for resolving foo/bar to /path/to/pnp/virtual/foo.zip or /monorepo/packages/foo. resolveToUnqualified does that.

In this case it would resolve to /path/to/pnp/virtual/foo.zip/bar https://yarnpkg.com/advanced/pnpapi/#resolvetounqualified

Jul 26 '22 14:07 merceyz

loaders loaders copied to clipboard

Helpers design doc

loaders
loaders copied to clipboard