workerd Initial implementation of new jsg module registry

RM-17318

First step of the new module registry implementation (the jsg only bits). Expect several more follow up PRs.

This PR provides the initial core of the new module registry implementation. It DOES NOT enable it in workerd at all, this is just the core implementation in workerd/jsg plus tests. Note that this will be gated with a compatibility flag because there are observable changes in behavior. The old module registry implementation will continue to exist but it will be incrementally refactored a bit later to reduce some complexities and eliminate some redundancies.

The new tests should illustrate fairly clearly how the new registry implementation works. I recommend spending a fair amount of time reading through the test code.

Here's a tl;dr with some highlights.

Every Worker will have a ModuleRegistry that is built when the worker is created. The ModuleRegistry is composed of zero or more ModuleBundles. It is the reponsibility of the ModuleBundle to actually manage and maintain the Modules.

A ModuleRegistry::Builder is used to create the ModuleRegistry. Multiples types of ModuleBundles can be added:

The BUNDLE type ModuleBundle contains modules from the worker configuration bundle itself.
The BUILTIN type ModuleBundle contains modules from the runtime that can be imported by user code
The BUILTIN_ONLY type ModuleBundle contains modules from the runtime that can only be imported by other built-ins
The FALLBACK type ModuleBundle is used to provide dynamic resolution using a fallback service (for local dev)

Importantly, the ModuleBundle is a virtual type that can be implemented in multiple ways. This will give us long term flexibility to add new kinds of modules from different kinds of sources without having to modify workerd/jsg directly to support them.

Once a ModuleRegistry is created it is entirely immutable other than any internal caching a ModuleBundle might do to optimize performance or ensure consistent resolution.

Importantly, a ModuleRegistry only needs to be built once. While workerd will only ever use a ModuleRegistry instance with a single Worker, the production environment may share a single instance across multiple replicas. The ModuleRegistry may be bound to any number of v8::Isolates.

All import specifiers are URLs. For BUNDLE Modules, the URL base is always file:///. For BUILTIN and BUILTIN_ONLY Modules, the base is always some built-in prefix like node: or cloudflare: or workerd:. FALLBACK type modules have a bit more flexibility on the base URL.

ESM modules support import.meta.url and import.meta.resolve(...).

New module types can be added without making any modifications to workerd/jsg.

All modules are resolved and evaluated lazily (when they are first imported). Dynamic imports and sync requires run within the current IoContext (if any) so they can schedule i/o (e.g. fetch, scheduler.wait(...), setTimeout(...)) if there is a current IoContext.

Expanded metrics are supported (via a new ResolveObserver) so we can measure resolution times and track which built-ins are actually being used.

All ESM modules (and eventually all CommonJS modules) will use v8 compile cache.

So... why do this at all? There are a number of intended goals:

Treat all import specifiers as URLs (per web platform standards)
Implement web platform standard import.meta.url, import.meta.resolve(...), and import.meta.main
Soon, support TC-39 import attributes
Resolve and evaluate all modules lazily
Allow all replicas of a worker to use the same ModuleRegistry (avoid rebuilding it for every replica)
Support compile cache for all ESM/CJS modules
Improve impl of fallback resolution service for local dev
Allow dynamic imports/requires to be IoContext aware
Enable nodeJsCompatModule to use node.js built-ins without needing the nodejs-compat flag (eventually, not covered in this PR)
Allow workerd internals to use node.js built-ins even when legacy service worker syntax is used
Allow easier introduction of new module types (can be added without modifying workerd/jsg)
Improve observability and metrics
Improved maintainability

Some details on handling import specifiers as URL:

All modules contained in a worker bundle use file:/// as the default base URL. We don't actually expose a filesystem in workerd so these aren't actual file paths but using file:/// is consistent with other runtimes, such as Node.js
Worker bundles CAN specify full absolute URLs other than file:///. For instance, they can shadow built-ins like node:buffer, or even use http:// URLs. These are used only as identifiers and never actually are used for http requests.
Single and double dot segments are normalized in the specifier. For instance, the import specifiers foo/../bar and bar are equivalent.
Relative import specifiers are resolved relative to the module that is importing them. For instance, if the importing module has a base URL of file:///foo, then the import specifier bar will resolve to file:///bar. However, if the importing module has a base URL of file:///foo/ (with a trailing slash), then the import specifier bar will resolve to file:///foo/bar.
Percent-encoding in the specifier path is normalized. For instance %66oo is normalized to foo.
Query strings and fragments in the specifier are significant in that they will resolve to the same underlying module but will result in a new evaluated instance. For example, foo?x and foo?y will cause the module registry to resolve the module file:///foo only once but the specifiers will evaluate into two separate instances of the module. This is consistent with the behavior we see in other runtimes.

Few other bits and pieecs:

Import attributes are explicitly unsupported right now. This is per the recommendation of the TC-39 proposal that asks embedders to explicitly reject import attributes that are not understood in order to preserve backwards compatibility.
Alias specifiers are supported. This means the module registry can map one import specifier as an alias of another import-maps style. If file:///foo is registered as an alias of file:///bar, then both will resolve the same underlying Module but will be instantiated as two different instances.
WASM modules are compiled once and cached. When multiple instances are evaulated, the cache will be used, speeding initialization and reducing memory.

Jan 18 '24 20:01 jasnell

TODOs...

[x] ~~Review Asan failures reported in github CI~~
[x] ~~Add original raw specifier to ResolveContext https://github.com/cloudflare/workerd/pull/1553#discussion_r1458595367~~
[x] ~~Ensure the fallback service can return a module with a different specifier https://github.com/cloudflare/workerd/pull/1553#discussion_r1458606136~~
[x] ~~Add test that verifies circular/recursive sync require fails as expected https://github.com/cloudflare/workerd/pull/1553#discussion_r1458612430~~
[x] ~~Ensure that percent-encoded path segments are handled correctly... e.g. %66oo should resolve the same module as foo.~~
[x] ~~Rebase to pick up other changes~~
[x] ~~I've managed to re-convince myself that dynamic import top-level evaluation should definitely not be able to perform i/o. This is a current restriction in the existing module registry that needs to carry through. There are too many weird complexities with allowing top-level i/o in a dynamic import. This PR will need to be updated to reimpose that restriction~~ I'm actually going to handle this in a separate follow on PR as I want to experiment a bit more to see if there's a way we can make things work.
[x] ~~Separate bits into smaller PRs to shrink this PR~~

Jan 18 '24 20:01 jasnell

Does this address https://github.com/cloudflare/workerd/issues/854#issuecomment-1675324298 ?

Mar 10 '24 18:03 irvinebroque

Separated out a couple of the smaller changes to separate PRs in #1823 and #1824

Mar 13 '24 15:03 jasnell

Separated a bit more of this out into a separate PR https://github.com/cloudflare/workerd/pull/1977 ... just trying to shrink the size a bit more.

Apr 05 '24 22:04 jasnell

[build] Note that as of #2012 we no longer use a glob for the JSG source files in src/workerd/jsg/BUILD.bazel – you'll need to explicitly add modules-new.c++ to the set, linking the new test will fail otherwise after rebasing.

Apr 17 '24 01:04 fhanau

I went through the header and some of the tests, the API design looks good! Will continue reviewing this tomorrow.

Apr 17 '24 01:04 fhanau

LGTM otherwise! 😌 I didn't really see correctness issues in the implementation, perhaps the prior comments were already useful for that although we might find some later on. Looks well-designed and coherent overall.

My only ask would be to clean up the foo.slice(0, foo.size()) pattern in the tests I mentioned by using kj::Array/ArrayPtr directly if possible. It appears around 20 times and should be somewhat amenable to Find-Replace. If it feels like something that would slow down the shipping process let's add a TODO for now.

Apr 30 '24 22:04 fhanau

@fhanau .. I believe I've addressed all the feedback thus far.

May 01 '24 20:05 jasnell

workerd workerd copied to clipboard

Initial implementation of new jsg module registry

workerd
workerd copied to clipboard