Loader Hooks
Hooking into the dependency loading steps in Nodejs should be easy, efficient, and reliable across CJS+ESM. Loader hooks would allow for developers to make systematic changes to dependency loading without breaking other systems.
It looks like discussion on this topic has died down, but I'm really interested in loader hooks and would be excited to work on an implementation! There's of prior discussion to parse through, and with this issue I'm hoping to reignite discussion and to create a place for feedback.
Some of that prior discussion:
- Bradley Farais' Resource APIs for Node and Using the Loader Hook API for
require() - Bradley Farais' RFC on per-package loader hooks issue #18233
- Myles' pluggable loaders issue #82 - particularly Myles' draft
- Bradley Farias' blocked multiple --loaders PR #18914
- Geoffrey Booth's runtime loader transpilation feature issue #96
- Geoffrey Booth's arbitrary module source text feature issue #97
edit (mylesborins)
here is a link to the design doc
https://docs.google.com/document/d/1J0zDFkwxojLXc36t2gcv1gZ-QnoTXSzK1O6mNAMlync/edit#heading=h.xzp5p5pt8hlq
Some use cases I've encountered:
I'm working on a custom dependency bundler and loader designed to improve cold-start startup times by transparently loading from a bundle to avoid file-system overhead. Currently, I have to monkey-patch module and reimplement CJS resolution with @soldair's node-module-resolution. I have to deeply understand and often reimplement CJS+ESM internals to work on this.
I also want to load modules from V8 code-cache similar to v8-compile-cache. Again I have to re-implement Module._compile and manually handle fallback for other extensions.
Some other use cases that would benefit:
- Transpilers/compileres (babel, ts-node)
- Bundlers (webpack, browserify, ncc)
- Other custom loaders (pnp, tink, loading from zip, etc)
- Code instrumentation (coverage, logging, testing)
I think the current exposed hooks are the right hooks to expose, but we definitely need to work on polishing the API:
- dynamic modules are a bit rough atm
- how are hooks registered
- how do multiple registered hooks interact
Very excited to see interest in this! I believe that @bmeck has a POC that has memory leaks that need to be fixed. @guybedford may know about this too
@devsnek I think there's more things that list is missing. E.g. providing resource content, not just format. Or the question of if we can extend aspects of this feature to CommonJS (e.g. for the tink/entropic/yarn case that currently requires monkey-patching the CommonJS loader or even the fs module itself). The current hooks were a good starting point but I would disagree that they are the right hooks.
@jkrems i think cjs loader hooks are outside the realm of our design. cjs can only deal with local files and it uses filenames, not urls.
Providing resource content is an interesting idea though. I wonder if we could just figure out a way to pass vm modules to the loader.
@devsnek we discussed and even implemented a PoC of intercepting CJS in the middle of last year and had a talk on the how/why in both
-
https://docs.google.com/presentation/d/16XSS7P2RMUtpBA8iolaMV9MpLP_Yot8lAml1trbl38I/edit#slide=id.g3ccac274de_0_125
-
https://docs.google.com/presentation/d/14Jn4ry0rlONMd2Yc1ZdFl-aU42XKVEgOc9x2Hy9C9ZI/edit#slide=id.g3c7879036d_1_100
These would only allow for files for require since that is what CJS works with but it should be tenable. Interaction with require.cache is a bit precarious but solvable if enough agreement can be reached.
@bmeck i don't doubt it can be done, i'm just less convinced it makes sense to include with the esm loader hooks given the large differences in the systems.
@A-lxe thanks for opening this discussion. It was interesting to hear you say that multiple loaders were one of the features you find important here. The PR at https://github.com/nodejs/node/pull/18914 could certainly be revived. Is this something you are thinking of working on? I'd be glad to collaborate on this work if you would like to discuss it further at all.
@guybedford Yeah! At least to me it seems right now the singular --loader api is insufficient for current loader use cases. For example in my projects I test with ts-node istanbul, mocha, and source-map-support -- each of which hooks into loading in one way or another IIRC. Optimally these could each independently interface with a loader hook api and smoothly compound on each other.
I think a node loader hook api needs to provide mechanisms for compounding on and falling back to already registered hooks (or the default cjs/esm behavior). I'm not really sure yet where to focus work, but I definitely want to collaborate :)
@A-lxe agreed we need a way to chain loaders. Would the approach in nodejs/node#18914 work for you, or if not, how would you want to go about it differently? One way to start might be to get that rebased and working again and then to iterate on it from there.
@guybedford I like the way nodejs/node#18914 chains the loaders and provides parent to allow fallback/augmentation of both the resolve + dynamic instantiation steps. I have some ideals for what a loader hook api should look like (particularly wrt supporting cjs) but I don't think those should get in the way of providing multiple --loader for esm. To be honest working on reviving that PR would be really useful for me in getting up to speed with things, so I would be happy to get started on that.
Some gripes which are more relevant to the initial --loader implementation rather than the multiple --loader feature:
- Why is there no runtime api for registering loaders? The current mechanisms using
--requireto register loaders via preloaded modules feel nice to me, and allow for opting to manually register at a particular point and with dynamic parameters. - A loader has to implement both hooks (and add fallback overhead) even if it only affects one.
- Similarly, I feel like the hooks could be more granular. @bmeck 's Resource APIs for Node splits things into
locate,retrieve, andtranslatehooks. With an additionalinitializehook for actually creating the module, these match the nodejs/node#18914 functionality, with an added bonus that onlyinitializeneeds to have coupling with cjs/esm. I'm curious what you think on this. - Doesn't hook into cjs require :'(
Also, your last comment on nodejs/node#18914 hints at another loaders implementation by @bmeck. Does this exist in an actionable state?
@BridgeAR this work also exists as part of the new loaders work which @bmeck started, so that effectively takes over from this PR already. Closing sounds sensible to me.
Why is there no runtime api for registering loaders?
Loaders are a higher-level feature of the environment, kind of like a boot system feature. They sit at the root of the security model for the application, so there are some security concerns here. In addition to that, hooking loaders during runtime can lead to unpredictable results, since any already-loaded modules will not get loaders applied. I'm sure @bmeck can clarify on these points, but those are the two I remember on this discussion offhand.
A loader has to implement both hooks (and add fallback overhead) even if it only affects one.
There is nothing to say we won't have CJS loader hooks or a generalized hook system, but it's just that our priority to date has been getting the ESM loader worked out. In addition the ESM hooks allow async functions, while CJS hooks would need some manipulation to support async calls. There's also the problem of the loaders running in different resolution spaces (URLs v paths) as discussed. Once we have our base ESM loader API finalized I'm sure we could extend it to CJS with some extra resolution metadata and handling of the resolution spaces, but I very much feel that loader unification is a "nice to have" that is additive over the base-level ESM API which should be the priority for us to consolidate and work towards first. That loader stability and architecture should take preference in the development process. That said, if you want to work on CJS unification first, feel free, but there are no guarantees the loader API will be stable or even unflagged unless we work hard towards that singular goal right now. So what I'm saying is chained loaders, whether the loader is off-thread, whether the API will be abstracted to deal with multi-realm and non-registry based API, and the translate hook all take preference in the path to a stable API to me, overy unifying ESM and CJS hooks. And that path is already very tenuous and unlikely, so that we should focus our combined efforts on the API stability first and foremost.
Similarly, I feel like the hooks could be more granular.
Implementing a translate or fetch hook for --loader could certainly be done and was a deliberate omission in the loader API. It is purely a problem of writing the code, making a PR, and the real hard part - getting consensus!
Doesn't hook into cjs require :'(
As mentioned above, this work can be done, but I would prefer to get the ground work done first.
That all makes a lot of sense and I appreciate you describing it for me :slightly_smiling_face:
I can start with pulling nodejs/node#18914 and getting that in a working state.
Just to spark some discussion, here’s a wholly theoretical potential API that I could imagine being useful to me as a developer:
import { registerHook } from 'module';
import { promises as fs, constants as fsConstants } from 'fs';
registerHook('beforeRead', async function automaticExtensionResolution (module) {
const extensions = ['', '.mjs', '.js', '.cjs'];
for (let i = 0; i < extensions.length; i++) {
const resolvedPathWithExtension = `${module.resolvedPath}${extensions[i]}`;
try {
await fs.access(resolvedPathWithExtension, fsConstants.R_OK);
module.originalResolvedPath = module.resolvedPath;
module.resolvedPath = resolvedPathWithExtension;
break;
} catch {}
}
return module;
}, 10);
The new registerHook method takes three arguments:
- The hook name, which is a point in Node’s code where these callbacks will be run.
beforeRead,afterRead, etc. - The function to call at that point, which takes as input an object with all the properties related to
importorrequireresolution and module loading that developers might want to override. Properties set on this object persist to other callbacks registered to later hooks in the process (e.g.module.fooset duringbeforeReadwould be accessible in a different callback registered toafterRead). - (Optional): The priority to call registered callbacks. If multiple callbacks have the same priority level, they are evaluated in the order that they were registered.
In the first example, my automaticExtensionResolution callback is registered to beforeRead because it’s important to rewrite the path that Node tries to load before Node tries to load any files from disk (because './file' wouldn’t exist but './file.js' might, and we don’t want an exception thrown before our callback can tell Node to load './file.js' instead). I’m imagining the module object here has an unused specifier property with whatever the original string was, e.g. pkg/file, and what Node would resolve that to in resolvedPath, e.g. ./node_modules/pkg/file.
Another example:
import { registerHook } from 'module';
import CoffeeScript from 'coffeescript';
registerHook('afterRead', async function transpileCoffeeScript (module) {
if (/\.coffee$|\.litcoffee$|\.coffee\.md$/.test(module.resolvedPath)) {
module.source = CoffeeScript.compile(module.source);
}
return module;
}, 10);
This hook is registered after Node has loaded the file contents from disk (module.source) but before the contents are added to Node’s cache or evaluated. This gives my callback a chance to modify those contents before Node does anything with them.
And so on. I have no idea how close or far any of the above is from the actual implementation of the module machinery; hopefully it’s not so distant as to be useless. Most of the loader use cases in our README could be satisfied by an API like this:
- Code coverage/instrumentation: In an
afterReadhook, a callback could count lines of code or the like. - Runtime loaders, transpilation at import time: In an
afterReadhook, example above. - Arbitrary sources for module source text: In a
beforeReadhook, our callback could assign content intomodule.source(and then Node would know to not read from disk for this module). - Mock modules (injection): Basically the same as previous, a
beforeReadhook could return a different path to load instead, or prefill the source code to use:if (module.specifier === 'request') module.source = mockRequestetc. Ideally Node would handle ifsourcewere an actual module rather than just a string to be evaluated. - Specifier resolution customization: This is like the extension resolution example above, though if we also want to support specifiers that Node can’t resolve, like
import 'https://something', we would need another hook likebeforeResolve. - Package encapsulation: We’re already implementing this as
"exports", but it could just as easily be implemented as a loader, at least for ESM. It would be inbeforeReadlike the extension resolution example. - Conditional imports: In
beforeReadorafterRead, based on some condition themodule.sourcecould be set to an empty string.
Anyway this is just to start a discussion of what kind of public-facing API we would want, and the kind of use cases it would support. I’m not at all married to any of the above, I’m just hoping that we come up with something that has roughly the same versatility as this.
Thanks for this write-up @GeoffreyBooth! Some thoughts to add to the discussion:
To me this looks like a transformer architecture, which exposes the entire in-progress module object to each hook, as opposed to the current --loader implementation, which has functional resolve and instantiate hooks. I would worry about exposing too large a surface area, ie developers doing something like reading a new file to override the old in the afterRead hook. Besides the transformer architecture, the differences largely come down to which hooks are exposed.
This api also doesn't allow for a loader to prevent other loaders from acting, which the wip multiple loader implementation at nodejs/node#18914 does. I don't think that's a bad thing, and I would be interested in hearing what people think on that front.
I'm not sure about the optional priority parameter. I don't think loaders should know much about what other loaders are registered or be making decisions about which order they're executed in. The user controls the order by choosing the order in which they register the loaders.
These are all good points. I would err on the side of exposing a lot of surface area, though, as that’s what users are used to from CommonJS. A lot of the power of things like stubs are because the surface area is huge.
In particular, I think we do want to allow reading a new file to override the old, or at least modifying the loaded string (which of course could be modified by loading the contents of a new file); otherwise we can’t have transpilers, for example, or stubs that are conditional based on the contents of a file rather than just the name in the specifier.
The priority option is just a convenience, so that the user doesn’t need to be careful about the order that they register hooks.
One thing that I thought of after posting was to add the concept of package scope to this. A lot of loaders will only be useful in the app’s package scope, not the combined scope of the app plus all its dependencies. We probably want some easy way to limit the callbacks to just the package scope around process.cwd().
On the afterRead point, you're totally right -- there needs to be a way of knowing+overriding the original loaded source (which can currently be done by loading the source and modifying it in the instantiate hook). I think I gave a bad example: by providing the entire in-progress module object, the user can modify aspects of it even in hooks where it shouldn't be modified (a better example might be module.resolvedPath = 'something' in afterRead).
by providing the entire in-progress module object, the user can modify aspects of it even in hooks where it shouldn’t be modified (a better example might be
module.resolvedPath = 'something'inafterRead).
Node could simply ignore any changes to “shouldn’t be modified” properties. That’s probably better than trying to lock them down or removing them from the module object, since late hooks might very well want to know things like what the resolvedPath was at an earlier step. This also feels like something we can work out in the implementation stage of building this.
So there is a lot to talk about on loaders. We have had multiple meetings discussing some design constraints to keep in mind. I think setting up another meeting just to review things from the past would be helpful.
At the moment, from PnP's perspective:
-
We'd need a hook that takes the bare request + the path of the file that makes the require call, and is expected to return the unqualified resolution (so in a
node_modulescontext it means such a loader would return/n_m/lodashrather than/n_m/lodash/index.js).- This is required for the CJS use case, as we have no reason to reimplement the extension /
mainresolution part.
- This is required for the CJS use case, as we have no reason to reimplement the extension /
-
We'd need a hook to instruct Node how to load the source for virtual files. For example, given
/cache/lodash.zip/index.js, we would replace the default hook (that would usereadFile) by a custom one that would open the zip, get the file source, and return it.- Note that in this case the source file might not exist at all. So rather than discussing in terms of paths, I think it would be beneficial to treat the resolution as an opaque identifier that doesn't necessarily match anything on the filesystem. We would then have a particular function (let's say
require.toDiskPath) that would turn such opaque identifiers into paths usable on the disk (or throw an exception if impossible).
- Note that in this case the source file might not exist at all. So rather than discussing in terms of paths, I think it would be beneficial to treat the resolution as an opaque identifier that doesn't necessarily match anything on the filesystem. We would then have a particular function (let's say
the first one is already possible with our current design (not including cjs). the second one is interesting and should probably exist, but it is unlikely that a cjs version can be added without breaking some modules that are loaded by it.
the first one is already possible with our current design (not including cjs). the second one is interesting and should probably exist, but it is unlikely that a cjs version can be added without breaking some modules that are loaded by it.
We're currently monkey-patching the fs module to add transparent support for zip and a few other extra files. It works for compatibility purposes, and it's likely we'll have to keep it as long as cjs environments are the norm, but I'm aware of the shortcomings of the approach and I'd prefer to have a long-term plan to sunset it someday, at least for ESM 🙂
I think for cjs it would be doable if the require.resolve API was deprecated and split between two functions: require.req(string): Resolution and require.path(Resolution): URL, but it might be out of scope for this group and as I mentioned the fs layer is working decently well enough at the moment that it's not an emergency to find something else.
We're currently monkey-patching the fs module to add transparent support for zip and a few other extra files. It works for compatibility purposes, and it's likely we'll have to keep it as long as cjs environments are the norm, but I'm aware of the shortcomings of the approach and I'd prefer to have a long-term plan to sunset it someday, at least for ESM 🙂
Part of my worry and reason why I feel we need to expand loader hooks as best we can for CJS is exactly that we don't guarantee this to work currently or in the future. Even if we cannot support C++ modules (the main problem with this FS patching approach that has been known since at least 2014 when I spoke on it) we can cover most situations and WASM at least can begin to replace some C++ module needs. I see this as a strong indication that we need to solve this somehow or provide some level of discretion for what is supported.
We have a mostly stable design document, please feel free to comment or request edit access as needed, at https://docs.google.com/document/d/1J0zDFkwxojLXc36t2gcv1gZ-QnoTXSzK1O6mNAMlync/edit#heading=h.xzp5p5pt8hlq .
The main contention is towards the bottom around potential implementations, but reading the things before then explain a lot of different ideas from historical threads and research over the past few years and have been summarized.
Great to see work moving here! I really like the overall model, we maybe just have a few disagreements about the exact APIs. I’ve already statement my feedback in the doc, but will summarize it again here:
-
I think the resolve hook and the body hook should be separated to allow for proper composability. By having the same call that resolves the module also load the module, makes it harder to add simple instrumentation hooks of source, or simple resolver hooks. For example, say I wanted a loader that resolves
.coffeeextensions before.jsextensions. Calling the parent resolve function will give me { resolved:/path/to/resolved.js, body:streamingBodyForResolvedJs} for that resolved file. That is the loader might have already opened the file descriptor potentially for the.jsresolution, when it is in fact a.coffeeresolution that we want to load. This conflation seems like it might cause issues. -
Using web APIs like
ResponseandBlobseems like bringing unnecessary web baggage to the API. For exampleResponsecan be replaced by an async iterator and a string identifier for the format.Blobcan be replaced by simply an interface containing the source string and a format string. I’m not sure what benefit is brought by using these web APIs that were designed to serve specific web use cases we don’t necessarily share (at least without seeing justification for what these use cases are and why we might share them). With the use of MIMEs for formats, do we now expect every transform to define its own MIME type? -
How would loader errors be handled over the loader serialization protocol? Can an Error instance be properly cloned over this boundary with the stack information etc? Or would we just print the stack trace to the console directly from the loader context, while providing an opaque error to the user. We need to ensure we maintain a good debugging experience for loader errors, so we need to make sure we can get error stacks. Or is the stack information itself an information leak?
Most of the above is relatively superficial though - the core of the model seems good to me. (1) means having two-phase messaging with loaders, so is slightly architectural though.
@guybedford
Per "separation". I agree there needs to be a "fetch"/"retrieve" hook of some kind, but not that resolve` should not be able to return a body. The problem you explain above is about passing data to parent loaders such as list of extensions, but does not seem to be fixed by separating loaders that I can tell.
Per APIs, we can argue about which APIs to use but we should start making lists of what features are desirable rather than bike shedding without purpose. To that end I'd like to posit the following:
- Most APIs working on sources do not support streaming such as
JSON.parse, JS parsers such asesprima, andWebAssembly.compile/instantiate. Even naiveRegExpsearches on the body will want to buffer them to a full body before searching. I think we should not focus on streaming for the first iteration in light of this. - Data may be wanted in either a binary format or a textual format. This largely depends on the format. Consumption methods for both should be available as some naive steps can lead to corruption like split UTF code points. I like
Blobbecause it does support this via.text()and.arrayBuffer(). - Streaming sources need care about how they are consumed. For example, reading the start of a stream to see if it begins with a magic number. If they cannot be replayed/cloned safely this is a problem. I like
Responseor a subset of that API because it has already solved these problems while preserving meta-data. - When possible, opaque data structures allow for streaming either eagerly or lazily and can be largely swapped without consequence as we determine the best approach. When doing things eagerly, they can buffer and even complete reading before being requested. When doing things lazily, they can avoid costly I/O waste if they are not consumed. To this end, I believe we should have an opaque wrapper that does provide meta-data and if a resource is available prior to the stream of the resource's body.
If that sounds fine, we can add constraints and a data structure to the design document.
Overall, I do not think streaming is necessarily the best first pass given how little I expect it to be useful currently.
I found Blob to be a well suited fit for the above points if we wrap it in a container type so that we can iterate on streaming. It has plenty of existing documentation on how to use it as well as compatibility and familiarity. It may not be the most ergonomic API for all use cases, but I think it fits well and don't see advantages in making our own.
Error stacks are able to be serialized properly, but it depends on what you are seeking from a debugging experience. They are a leak technically, but I do not consider them a fatal leak since a loader can throw their own object instead of a JS Error if they wish to censor things. Not all things thrown necessarily have a stack associated with them, so if the question is mostly about how Errors are serialized it would just be ensuring they serialize properly (whatever we decide) when being sent across the messaging system. There is a question of async stack traces if we nest messages across threads but I am unsure if we want to even support cross thread stack traces as the ids of modules could conflict unless we add more data to represent the context.
I would be wary about user actionability on these messages as Loaders are likely to be more difficult to write properly than other APIs. However, debuggers and the like should also work if they want to debug things that way.
Per "separation". I agree there needs to be a "fetch"/"retrieve" hook of some kind, but not that resolve` should not be able to return a body. The problem you explain above is about passing data to parent loaders such as list of extensions, but does not seem to be fixed by separating loaders that I can tell.
As another example, consider a loader which applies a security policy that only certain modules on the file system can be loaded. This loader is added last in the chain, and basically provides a filter on the resolver, throwing for resolutions that are not permitted. The issue then with the model is that by the time the permission loader throws, the file might have already been opened by the underlying parent loader. This is the sort of separation of concerns that concerns me.
Per APIs, we can argue about which APIs to use but we should start making lists of what features are desirable rather than bike shedding without purpose.
The basic requirement is being able to determine what buffer to execute, and how to execute it in the module system. The simplest interface that captures this requirement is -
interface Output {
source: String | Buffer;
format: 'wasm' | 'module' | 'addon' | 'commonjs' | 'json'
}
The above could be extended to support streams by supporting source as an async iterator as well, but I'm certainly not pushing streams support yet either.
Error stacks are able to be serialized properly, but it depends on what you are seeking from a debugging experience.
Thanks for the clarifications re error stacks, we should just make sure we are aware of the debugging experience implications and properly support these workflows. Just getting the sync stack copied across as a string should be fine I guess.
As another example, consider a loader which applies a security policy that only certain modules on the file system can be loaded. This loader is added last in the chain, and basically provides a filter on the resolver, throwing for resolutions that are not permitted. The issue then with the model is that by the time the permission loader throws, the file might have already been opened by the underlying parent loader. This is the sort of separation of concerns that concerns me.
Is the concern reading the file, or evaluating the file. I would be surprised if the loader actually evaluated the file. I'm also unclear how this would prevent a loader from fetching that resource even if we split the hooks if we expose the ability to read off disk etc. to loaders.
interface Output { source: string | Buffer; format: 'wasm' | 'module' | 'addon' | 'commonjs' | 'json' }
I want to agree that this is terser, but I do not think it is simpler. A few design decisions here have impacts that I find to have underlying complexity.
- Using a union type for source
- This necessitates detecting which type you got using code like
typeof source === 'string'at every usage ofsource. - With custom formats, this would be compounded as it may be unclear which is the preferred format to serialize for other loaders.
- By enforcing it be of 1 type since Union type's exclusive nature this means coordinating how to normalize data somehow to know the expected type. For example, when reading files into memory a general loader would return binary representation rather than conditionally serializing to strings.
- By enforcing a single type for the source, it makes adding types more difficult. If you expose just
stringandBuffertypes code could be written usingif (typeof source === 'string') {} else {/*only assumes a buffer*/}. The solution would be adding another field most likely to prevent breaking patterns like above and you would end up withbody.streamwhich would be exclusive withbody.sourcesomehow?
- This necessitates detecting which type you got using code like
- Eagerly exposing the source without a method means allocating/normalizing serialized data even if it is never used by the current loader.
- By not using an async method to expose the body/source a head of line blocking problem has been introduced. A body must be completely read before being handed to another loader.
- Using an enum for the format
- This enum would need to be a coordinated list with MIMEs for any loader supporting
http/https/data/blobURLs etc. This is compounded by not having a clear conversion step for custom formats so that things like CoffeeScript could be converted from/to these schemes properly which would mean loaders also participating in MIME/enum normalization (either through the runtime, or via some ecosystem module). MIMEs both would not require this normalization, and would have an existing coordination mechanism through IANA even for formats not seeking to live under a standards organization by using thevndandprsprefixes. - Using an enum prevents metadata attachments which are important when dealing with variants of formats. Consider parameters for dialects and encodings such as JSX; a mime can safely encode
text/javascript;jsx=true. It will still be picked up astext/javascripteven if the parameter is unknown. Unknown parameters are not entirely under the scope of IANA but MIME handling is supposed to ignore unknown parameters per RFC2045.
- This enum would need to be a coordinated list with MIMEs for any loader supporting
Why would we ever not want order to matter? If a loader wants to effect other loaders, it should just have to run before them - like any other JavaScript code anywhere.
Is the concern reading the file, or evaluating the file. I would be surprised if the loader actually evaluated the file.
The concern is reading the file - doing unnecessary work in the OS. This is an indication that the abstraction model is missing the the separation of concerns that is needed. File systems and URLs use paths as an abstraction, and separate resolution from retrieval. Yes you can get resolution through retrieval with symlinks and redirects, but that is probably closer to alias modules.
It's pretty important to having a good composable loader API to ensure we maintain this distinction between resolution and retrieval.
This necessitates detecting which type you got using code like typeof source === 'string' at every usage of source.
We could go with just Buffer or TypedArray too by default, this resolves the next three points you mention as well I believe.
When the time comes to introduce streams vi async iteration, just having the [Symbol.asyncIterator] check as part of the API would make sense to me.
Alternatively if we definitely want just one type, then we can always just enforce an async iterator of buffers from the start.
Eagerly exposing the source without a method means allocating/normalizing serialized data even if it is never used by the current loader. By not using an async method to expose the body/source a head of line blocking problem has been introduced. A body must be completely read before being handed to another loader.
By going with an async iterator from the start that seems like it would resolve this concern too.
Note that the function returning such a body output can itself be an async function such that there is already an async method for providing the body.
This enum would need to be a coordinated list with MIMEs for any loader supporting http/https/data/blob URLs etc. This is compounded by not having a clear conversion step for custom formats so that things like CoffeeScript could be converted from/to these schemes properly which would mean loaders also participating in MIME/enum normalization (either through the runtime, or via some ecosystem module). MIMEs both would not require this normalization, and would have an existing coordination mechanism through IANA even for formats not seeking to live under a standards organization by using the vnd and prs prefixes.
The list of enums is already how we do it in the current API and that seems to be working fine to me. What problems have you found with this?
Consider for example a CoffeeScript loader:
export async function body {
return {
output: createCoffeeTransformStream(createReadStream(resolved)),
format: 'module'
};
}
There is no need to define the format: 'coffee' because the retrieval and transform are the same step, therefore the format only needs to correspond with the engine-level formats, which we already manage internally.
Using an enum prevents metadata attachments which are important when dealing with variants of formats. Consider parameters for dialects and encodings such as JSX; a mime can safely encode text/javascript;jsx=true. It will still be picked up as text/javascript even if the parameter is unknown. Unknown parameters are not entirely under the scope of IANA but MIME handling is supposed to ignore unknown parameters per RFC2045.
Most systems use a configuration file on the file system for managing transform options. tsconfig.json, babel.config.js etc. This provides the high degree of customization that these tools require.
I don't think most build tools would want to register a MIME and use this as a custom serialization scheme for their options.