workerd
workerd copied to clipboard
Initial implementation of new jsg module registry
RM-17318
First step of the new module registry implementation (the jsg only bits). Expect several more follow up PRs.
This PR provides the initial core of the new module registry implementation. It DOES NOT enable it in workerd at all, this is just the core implementation in workerd/jsg plus tests. Note that this will be gated with a compatibility flag because there are observable changes in behavior. The old module registry implementation will continue to exist but it will be incrementally refactored a bit later to reduce some complexities and eliminate some redundancies.
The new tests should illustrate fairly clearly how the new registry implementation works. I recommend spending a fair amount of time reading through the test code.
Here's a tl;dr with some highlights.
Every Worker will have a ModuleRegistry
that is built when the worker is created. The ModuleRegistry
is composed of zero or more ModuleBundle
s. It is the reponsibility of the ModuleBundle
to actually manage and maintain the Module
s.
A ModuleRegistry::Builder
is used to create the ModuleRegistry
. Multiples types of ModuleBundle
s can be added:
- The
BUNDLE
typeModuleBundle
contains modules from the worker configuration bundle itself. - The
BUILTIN
typeModuleBundle
contains modules from the runtime that can be imported by user code - The
BUILTIN_ONLY
typeModuleBundle
contains modules from the runtime that can only be imported by other built-ins - The
FALLBACK
typeModuleBundle
is used to provide dynamic resolution using a fallback service (for local dev)
Importantly, the ModuleBundle
is a virtual type that can be implemented in multiple ways. This will give us long term flexibility to add new kinds of modules from different kinds of sources without having to modify workerd/jsg
directly to support them.
Once a ModuleRegistry
is created it is entirely immutable other than any internal caching a ModuleBundle
might do to optimize performance or ensure consistent resolution.
Importantly, a ModuleRegistry
only needs to be built once. While workerd
will only ever use a ModuleRegistry
instance with a single Worker, the production environment may share a single instance across multiple replicas. The ModuleRegistry
may be bound to any number of v8::Isolate
s.
All import specifiers are URL
s. For BUNDLE
Modules, the URL base is always file:///
. For BUILTIN
and BUILTIN_ONLY
Modules, the base is always some built-in prefix like node:
or cloudflare:
or workerd:
. FALLBACK
type modules have a bit more flexibility on the base URL.
ESM modules support import.meta.url
and import.meta.resolve(...)
.
New module types can be added without making any modifications to workerd/jsg
.
All modules are resolved and evaluated lazily (when they are first imported). Dynamic imports and sync requires run within the current IoContext
(if any) so they can schedule i/o (e.g. fetch
, scheduler.wait(...)
, setTimeout(...)
) if there is a current IoContext
.
Expanded metrics are supported (via a new ResolveObserver
) so we can measure resolution times and track which built-ins are actually being used.
All ESM modules (and eventually all CommonJS modules) will use v8 compile cache.
So... why do this at all? There are a number of intended goals:
- Treat all import specifiers as URLs (per web platform standards)
- Implement web platform standard
import.meta.url
,import.meta.resolve(...)
, andimport.meta.main
- Soon, support TC-39 import attributes
- Resolve and evaluate all modules lazily
- Allow all replicas of a worker to use the same
ModuleRegistry
(avoid rebuilding it for every replica) - Support compile cache for all ESM/CJS modules
- Improve impl of fallback resolution service for local dev
- Allow dynamic imports/requires to be IoContext aware
- Enable
nodeJsCompatModule
to use node.js built-ins without needing thenodejs-compat
flag (eventually, not covered in this PR) - Allow workerd internals to use node.js built-ins even when legacy service worker syntax is used
- Allow easier introduction of new module types (can be added without modifying workerd/jsg)
- Improve observability and metrics
- Improved maintainability
Some details on handling import specifiers as URL:
- All modules contained in a worker bundle use
file:///
as the default base URL. We don't actually expose a filesystem in workerd so these aren't actual file paths but usingfile:///
is consistent with other runtimes, such as Node.js - Worker bundles CAN specify full absolute URLs other than
file:///
. For instance, they can shadow built-ins likenode:buffer
, or even usehttp://
URLs. These are used only as identifiers and never actually are used for http requests. - Single and double dot segments are normalized in the specifier. For instance, the import specifiers
foo/../bar
andbar
are equivalent. - Relative import specifiers are resolved relative to the module that is importing them. For instance, if the importing module has a base URL of
file:///foo
, then the import specifierbar
will resolve tofile:///bar
. However, if the importing module has a base URL offile:///foo/
(with a trailing slash), then the import specifierbar
will resolve tofile:///foo/bar
. - Percent-encoding in the specifier path is normalized. For instance
%66oo
is normalized tofoo
. - Query strings and fragments in the specifier are significant in that they will resolve to the same underlying module but will result in a new evaluated instance. For example,
foo?x
andfoo?y
will cause the module registry to resolve the modulefile:///foo
only once but the specifiers will evaluate into two separate instances of the module. This is consistent with the behavior we see in other runtimes.
Few other bits and pieecs:
- Import attributes are explicitly unsupported right now. This is per the recommendation of the TC-39 proposal that asks embedders to explicitly reject import attributes that are not understood in order to preserve backwards compatibility.
- Alias specifiers are supported. This means the module registry can map one import specifier as an alias of another import-maps style. If
file:///foo
is registered as an alias offile:///bar
, then both will resolve the same underlyingModule
but will be instantiated as two different instances. - WASM modules are compiled once and cached. When multiple instances are evaulated, the cache will be used, speeding initialization and reducing memory.
TODOs...
- [x] ~~Review Asan failures reported in github CI~~
- [x] ~~Add original raw specifier to
ResolveContext
https://github.com/cloudflare/workerd/pull/1553#discussion_r1458595367~~ - [x] ~~Ensure the fallback service can return a module with a different specifier https://github.com/cloudflare/workerd/pull/1553#discussion_r1458606136~~
- [x] ~~Add test that verifies circular/recursive sync require fails as expected https://github.com/cloudflare/workerd/pull/1553#discussion_r1458612430~~
- [x] ~~Ensure that percent-encoded path segments are handled correctly... e.g.
%66oo
should resolve the same module asfoo
.~~ - [x] ~~Rebase to pick up other changes~~
- [x] ~~I've managed to re-convince myself that dynamic import top-level evaluation should definitely not be able to perform i/o. This is a current restriction in the existing module registry that needs to carry through. There are too many weird complexities with allowing top-level i/o in a dynamic import. This PR will need to be updated to reimpose that restriction~~ I'm actually going to handle this in a separate follow on PR as I want to experiment a bit more to see if there's a way we can make things work.
- [x] ~~Separate bits into smaller PRs to shrink this PR~~
Does this address https://github.com/cloudflare/workerd/issues/854#issuecomment-1675324298 ?
Separated out a couple of the smaller changes to separate PRs in #1823 and #1824
Separated a bit more of this out into a separate PR https://github.com/cloudflare/workerd/pull/1977 ... just trying to shrink the size a bit more.
[build] Note that as of #2012 we no longer use a glob for the JSG source files in src/workerd/jsg/BUILD.bazel
– you'll need to explicitly add modules-new.c++
to the set, linking the new test will fail otherwise after rebasing.
I went through the header and some of the tests, the API design looks good! Will continue reviewing this tomorrow.
LGTM otherwise! 😌 I didn't really see correctness issues in the implementation, perhaps the prior comments were already useful for that although we might find some later on. Looks well-designed and coherent overall.
My only ask would be to clean up the foo.slice(0, foo.size())
pattern in the tests I mentioned by using kj::Array/ArrayPtr directly if possible. It appears around 20 times and should be somewhat amenable to Find-Replace. If it feels like something that would slow down the shipping process let's add a TODO for now.
@fhanau .. I believe I've addressed all the feedback thus far.