node
node copied to clipboard
module: add __esModule to require()'d ESM
Tooling in the ecosystem have been using the __esModule property to recognize transpiled ESM in consuming code. For example, a 'log' package written in ESM:
export function log(val) { console.log(val); }
Can be transpiled as:
exports.__esModule = true;
exports.default = function log(val) { console.log(val); }
The consuming code may be written like this in ESM:
import log from 'log'
Which gets transpiled to:
const _mod = require('log');
const log = _mod.__esModule ? _mod.default : _mod;
So to allow transpiled consuming code to recognize require()'d real ESM as ESM and pick up the default exports, we add a __esModule property by composing a new object with the namespace as prototype and an __esModule property. This is done like this because namespace objects are not extensible and proxies or synthetic module facades are more expensive than prototype lookups.
Credit to @Jarred-Sumner from Bun for suggesting the prototype lookup idea.
Refs: https://github.com/nodejs/node/pull/51977#issuecomment-2002485317 Refs: https://github.com/nodejs/node/issues/52134
Review requested:
- [ ] @nodejs/loaders
CI: https://ci.nodejs.org/job/node-test-pull-request/57864/
cc @nicolo-ribaudo
While this approach strikes a balance between performance and compatibility, so I think we should do this for now, there are some glitches that may or may not matter: because the namespace is in the prototype chain, users can't easily spread or copy the exports. Object.keys()
won't return the keys of the exports either. So if users want to copy the exports or get the keys (maybe to create a mock or something), they probably will end up finding their way to the namespace in the prototype chain and do what they want. I suspect it's not that rare a use case. But maybe it's inevitable to work with some quirks to support required ESM.
I wonder if it's possible to get V8 to allow the host (not random users) to customize the prototype of module namespace objects - doesn't seem that hard to do, implementation wise, and V8 already allows embedders to customize e.g. the global object template - then we can put { __esModule: true }
on the prototype of the namespace object instead, which probably would be a lot more natural. But perhaps that requires a spec change and is a bit far-fetched. So this is probably as good as we can get for now.
IIUC, the spec requires the module exotic objects to have null
as prototype: https://tc39.es/ecma262/#sec-module-namespace-exotic-objects-getprototypeof
Could we create a synthetic module wrapper here effectively like export * from 'mod'; export const __esModule = true
?
Could we create a synthetic module wrapper here effectively like export * from 'mod'; export const __esModule = true?
That was brought up in https://github.com/nodejs/node/issues/52134 but has the following cons:
- __esModule would be enumerable
- Module facade can lead to a non-trivial overhead. Take https://github.com/nodejs/node/pull/51669#issuecomment-1927185230 for example, it regressed 9% of the startup even with only the facades created for the handful of builtins (I think SourceTextModule facade would be even more expensive than SyntheticModule ones). Part of the reason why we are doing this in core instead of letting transpilers to handle it is for performance. The facade would cancel that.
I wonder how important it is for __esModule
to be not enumerable? I can also do a prototype and check the performance of the synthetic module. While it's going to be expensive, it more or less already happens with import cjs
or import 'node:builtin'
, we could focus on the most sane behavior first, then think about optimizations we can make (e.g. I think it should be possible to do some optimizations in V8 for this).
I wonder how important it is for
__esModule
to be not enumerable?
Babel actually has a loose mode that makes it enumerable (through simple assignment, to get a smaller compiled output). It doesn't matter in most cases, except... when you spread/Object.keys/for...in, which are the same cases in which the object wrapper implemented by this PR is annoying 😬
Could you generate the new object from the namespace object, creating own enumerable getters for every export to support live bindings, and add the non-enumberable __esModule
property to that?
Then spread and Object.keys()
will work, and the object can have a null
prototype - at the cost of more complex initialization.
If that is done, then I suggest that that object should also be frozen and have [Symbol.toStringTag]: "Module"
, to look almost like a namespace object (except that namespace objects don't actually have getters).
I am going to do some investigation of the real performance impact before proceeding with this, because this could lead to behavior changes that may be difficult to back out of in the future. Behavior-wise, I still prefer to see reference equal import()
and require()
results. If not, then at least make the results copy-able or allow Object.keys()
to work.
Could you generate the new object from the namespace object, creating own enumerable getters for every export to support live bindings, and add the non-enumberable __esModule property to that?
Experimenting with the idea, I noticed that if we do this, we may be able to allow CJS -> ESM cycles (because we can then lazily return the exports of a ESM), that might be desirable for UX, regardless of the __esModule
question?
(One downside of relying on this for cycles is that users might still have race conditions until we make the ESM loader fully conditionally synchronous).
I think this is what people are already advocating here, but just in case and to be explicit: We have precedent for this in import(esm)
(and import * as mod from 'some-package'
), so I think the return of require(esm)
should be the same, and that should facilitate referential equality.
We have precedent for this in import(esm) (and import * as mod from 'some-package')
Not sure if I'm following - what's the precedent specifically?
And, as long as we do want to insert __esModule
into the namespace, the result of import(esm)
and require(esm)
won't be reference equal anymore, unless we also insert it to the result of import(esm)
.
unless we also insert it to the result of
import(esm)
🤔 sure? No reason comes to mind to not do that.
unless we also insert it to the result of
import(esm)
🤔 sure? No reason comes to mind to not do that.
The reason not to is that it's not part of the JS modules spec and once shipped could never be removed from import(). If there a way to limit the scope, like it's only added for dynamic import() within CJS modules, then it might be somewhat less impactful?
Do people need require(foo) === await import(foo)
anyway? What are they doing with that? Is it some kind of environment detection? Are they using modules in a map that have been imported different ways? I'm not sure why any generic module loading that supports JS modules wouldn't be using import()
everywhere.
Do people need require(foo) === await import(foo) anyway?
Actually I don't think it matter that much, because import(cjs)
(returns a synthetic module namespace) is not reference equal to require(cjs)
already, and no one complains AFAIK (though it might also have to do with that it's probably not too common to import(cjs)
).
Do people need require(foo) === await import(foo) anyway?
It's not for comparison; it facilitates avoiding the dual-package hazard (and thus can vastly simplify a package's internals):
const mod = require('some-package');
mod.foo++;
import * as mod from 'some-package';
mod.foo++;
In the above, assuming the package is done right, foo
will be the same foo
and thus be incremented twice (instead of two different foo
s each getting incremented once each). Absolutely no-one wants two different foo
here.
@JakobJingleheimer I don't think this use case needs reference equality of "the thing that gets returned by require()
or import()
", what it needs is live binding of what's inside those returned objects. The example still works even if mod.foo
are done from re-exported getter/setters (solution proposed https://github.com/nodejs/node/pull/52166#issuecomment-2015415552), or if mod.foo
comes from the prototype (the original solution that's still in this PR). Although I do think this suggests that we shouldn't do https://github.com/nodejs/node/pull/52166#issuecomment-2011020389 because then you'll be modifying the live binding of the facade, not the original module.
import * as mod from 'some-package'; mod.foo++;
In the above, assuming the package is done right,
foo
will be the samefoo
and thus be incremented twice (instead of two differentfoo
s each getting incremented once each). Absolutely no-one wants two differentfoo
here.
mod.foo++
would throw a TypeError because JS module exports are readonly. But the point about the underlying module instance stands - and is met by the proposed solutions here.
It's just that the example needs to be:
const mod = require('some-package');
mod.incrementFoo();
import * as mod from 'some-package';
mod.incrementFoo();
where mod.incrementFoo()
mutates the same module instance in both cases.
Right actually I don't think it matters for https://github.com/nodejs/node/pull/52166#issuecomment-2011020389 either since the module namespace is not mutable. And if the mutation is done via a method or on an exported object's properties, all the solutions proposed so far behave the same, they will just modify what's in the underlying module.
If it's not a pointer, that'll be hell to manage. Yes we can, but surely a pointer is the most simple and desirable, no?
It's just that the example needs to be:
const mod = require('some-package'); mod.incrementFoo(); import * as mod from 'some-package'; mod.incrementFoo(); where mod.incrementFoo() mutates the same module instance in both cases
Ah, yes—fair point (I oversimplified). As long as they point to the same underlying state, great 😊 but hopefully there is as little in the middle as possible/necessary.
Do people need require(foo) === await import(foo) anyway?
Actually I don't think it matter that much, because
import(cjs)
(returns a synthetic module namespace) is not reference equal torequire(cjs)
already, and no one complains AFAIK (though it might also have to do with that it's probably not too common toimport(cjs)
).
With CJS, this just has to be because of the semantic differences between require(cjs)
and import(cjs)
. Hopefully that expectation is set and non-equality between require(esm)
and import(esm)
is natural and more-or-less assumed now, even though there isn't the same semantic difference.
it's probably not too common to import(cjs))
I do this 😅 but only with legacy packages until I can replace them.
Could you generate the new object from the namespace object, creating own enumerable getters for every export to support live bindings, and add the non-enumberable __esModule property to that?
Experimenting with the idea
Did this approach prove viable? What was the performance like?
Could this not be implemented within module_wrap.cc to produce a require(esm)-specific namespace object that adds __esModule
in a way that makes this transparent to the user?
I did a benchmark with the proposed approaches:
- node_proto is what this PR currently does (make the namespace the prototype, and
{__esModule: true}
as the final object in the chain) - node_desc is what https://github.com/nodejs/node/pull/52166#issuecomment-2015415552 proposed ( sort-of: I put the
{__esModule: true}
on the prototype) - node_synthetic is a version that creates a synthetic module and re-exports everything using a native utility, with an added
__esModule: true
. - node_synthetic_js is basically the same as 3 but does the re-export in JS using
ModuleWrap
test-require-esm/load.cjs
comes from the repo that I used to test loading 30+ high-impact ESM-only npm pacakages with representative usage patterns: https://github.com/joyeecheung/test-require-esm/blob/main/load.cjs
The numbers are basically the same.
Benchmark 1: ./node_proto --experimental-require-module ../test-require-esm/load.cjs
Time (mean ± σ): 686.1 ms ± 13.8 ms [User: 773.2 ms, System: 135.1 ms]
Range (min … max): 668.6 ms … 705.9 ms 10 runs
Benchmark 2: ./node_desc --experimental-require-module ../test-require-esm/load.cjs
Time (mean ± σ): 689.0 ms ± 13.4 ms [User: 779.7 ms, System: 130.7 ms]
Range (min … max): 671.4 ms … 713.3 ms 10 runs
Benchmark 3: ./node_synthetic --experimental-require-module ../test-require-esm/load.cjs
Time (mean ± σ): 685.9 ms ± 12.2 ms [User: 774.7 ms, System: 137.2 ms]
Range (min … max): 672.3 ms … 703.4 ms 10 runs
Benchmark 4: ./node_synthetic_js --experimental-require-module ../test-require-esm/load.cjs
Time (mean ± σ): 695.0 ms ± 14.3 ms [User: 806.8 ms, System: 116.5 ms]
Range (min … max): 674.9 ms … 723.0 ms 10 runs
Summary
'./node_synthetic --experimental-require-module ../test-require-esm/load.cjs' ran
1.00 ± 0.03 times faster than './node_proto --experimental-require-module ../test-require-esm/load.cjs'
1.00 ± 0.03 times faster than './node_desc --experimental-require-module ../test-require-esm/load.cjs'
1.01 ± 0.03 times faster than './node_synthetic_js --experimental-require-module ../test-require-esm/load.cjs'
I also have a microbenchmark modified from https://github.com/nodejs/node/issues/52369 (but uses require(esm)
) and the numbers are again not too different - maybe I could write a more formal microbenchmark to get more accurate numbers with statistical confidence, but at least I am not seeing anything significant impact even from unstable numbers.
Behavior wise I prefer 2-4 because the exports are spreadable/enumerable. 2 has the advantage that __esModule
is not enumerable, so you won't see this weird bit when logging the required result, the disadvantage is that when you log it you see a getter, not the value..
Could this not be implemented within module_wrap.cc to produce a require(esm)-specific namespace object that adds __esModule in a way that makes this transparent to the user?
There is currently not way to create a namespace object directly, it needs to come from a module, either a SourceTextModule or a SyntheticModule.
(Actually I now realize that there is no point chasing down the SyntheticModule path because that breaks live bindings..to preserve live bindings + use a namespace object it needs to be an export star in a SourceTextModule)
I tried https://github.com/nodejs/node/pull/52166#issuecomment-2011020389 and it needs a bit of extra work - for modules with default exports it needs to add export { default } from 'original'
. The number is again similar to others when testing https://github.com/joyeecheung/test-require-esm/blob/main/load.cjs
I heard some comments from @jakebailey either about this being bad for TypeScript, or about not doing this being bad, but I didn't understand which of the two. Jake, could you re-explain it? 😅