node icon indicating copy to clipboard operation
node copied to clipboard

module: add __esModule to require()'d ESM

Open joyeecheung opened this issue 11 months ago • 51 comments

Tooling in the ecosystem have been using the __esModule property to recognize transpiled ESM in consuming code. For example, a 'log' package written in ESM:

export function log(val) { console.log(val); }

Can be transpiled as:

exports.__esModule = true;
exports.default = function log(val) { console.log(val); }

The consuming code may be written like this in ESM:

import log from 'log'

Which gets transpiled to:

const _mod = require('log');
const log = _mod.__esModule ? _mod.default : _mod;

So to allow transpiled consuming code to recognize require()'d real ESM as ESM and pick up the default exports, we add a __esModule property by composing a new object with the namespace as prototype and an __esModule property. This is done like this because namespace objects are not extensible and proxies or synthetic module facades are more expensive than prototype lookups.

Credit to @Jarred-Sumner from Bun for suggesting the prototype lookup idea.

Refs: https://github.com/nodejs/node/pull/51977#issuecomment-2002485317 Refs: https://github.com/nodejs/node/issues/52134

joyeecheung avatar Mar 20 '24 20:03 joyeecheung

Review requested:

  • [ ] @nodejs/loaders

nodejs-github-bot avatar Mar 20 '24 20:03 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/57864/

nodejs-github-bot avatar Mar 20 '24 20:03 nodejs-github-bot

cc @nicolo-ribaudo

While this approach strikes a balance between performance and compatibility, so I think we should do this for now, there are some glitches that may or may not matter: because the namespace is in the prototype chain, users can't easily spread or copy the exports. Object.keys() won't return the keys of the exports either. So if users want to copy the exports or get the keys (maybe to create a mock or something), they probably will end up finding their way to the namespace in the prototype chain and do what they want. I suspect it's not that rare a use case. But maybe it's inevitable to work with some quirks to support required ESM.

I wonder if it's possible to get V8 to allow the host (not random users) to customize the prototype of module namespace objects - doesn't seem that hard to do, implementation wise, and V8 already allows embedders to customize e.g. the global object template - then we can put { __esModule: true } on the prototype of the namespace object instead, which probably would be a lot more natural. But perhaps that requires a spec change and is a bit far-fetched. So this is probably as good as we can get for now.

joyeecheung avatar Mar 20 '24 20:03 joyeecheung

IIUC, the spec requires the module exotic objects to have null as prototype: https://tc39.es/ecma262/#sec-module-namespace-exotic-objects-getprototypeof

aduh95 avatar Mar 20 '24 21:03 aduh95

Could we create a synthetic module wrapper here effectively like export * from 'mod'; export const __esModule = true?

guybedford avatar Mar 21 '24 01:03 guybedford

Could we create a synthetic module wrapper here effectively like export * from 'mod'; export const __esModule = true?

That was brought up in https://github.com/nodejs/node/issues/52134 but has the following cons:

  1. __esModule would be enumerable
  2. Module facade can lead to a non-trivial overhead. Take https://github.com/nodejs/node/pull/51669#issuecomment-1927185230 for example, it regressed 9% of the startup even with only the facades created for the handful of builtins (I think SourceTextModule facade would be even more expensive than SyntheticModule ones). Part of the reason why we are doing this in core instead of letting transpilers to handle it is for performance. The facade would cancel that.

joyeecheung avatar Mar 21 '24 06:03 joyeecheung

I wonder how important it is for __esModule to be not enumerable? I can also do a prototype and check the performance of the synthetic module. While it's going to be expensive, it more or less already happens with import cjs or import 'node:builtin', we could focus on the most sane behavior first, then think about optimizations we can make (e.g. I think it should be possible to do some optimizations in V8 for this).

joyeecheung avatar Mar 21 '24 14:03 joyeecheung

I wonder how important it is for __esModule to be not enumerable?

Babel actually has a loose mode that makes it enumerable (through simple assignment, to get a smaller compiled output). It doesn't matter in most cases, except... when you spread/Object.keys/for...in, which are the same cases in which the object wrapper implemented by this PR is annoying 😬

nicolo-ribaudo avatar Mar 21 '24 17:03 nicolo-ribaudo

Could you generate the new object from the namespace object, creating own enumerable getters for every export to support live bindings, and add the non-enumberable __esModule property to that?

Then spread and Object.keys() will work, and the object can have a null prototype - at the cost of more complex initialization.

justinfagnani avatar Mar 22 '24 16:03 justinfagnani

If that is done, then I suggest that that object should also be frozen and have [Symbol.toStringTag]: "Module", to look almost like a namespace object (except that namespace objects don't actually have getters).

nicolo-ribaudo avatar Mar 22 '24 16:03 nicolo-ribaudo

I am going to do some investigation of the real performance impact before proceeding with this, because this could lead to behavior changes that may be difficult to back out of in the future. Behavior-wise, I still prefer to see reference equal import() and require() results. If not, then at least make the results copy-able or allow Object.keys() to work.

joyeecheung avatar Mar 25 '24 19:03 joyeecheung

Could you generate the new object from the namespace object, creating own enumerable getters for every export to support live bindings, and add the non-enumberable __esModule property to that?

Experimenting with the idea, I noticed that if we do this, we may be able to allow CJS -> ESM cycles (because we can then lazily return the exports of a ESM), that might be desirable for UX, regardless of the __esModule question?

(One downside of relying on this for cycles is that users might still have race conditions until we make the ESM loader fully conditionally synchronous).

joyeecheung avatar Mar 29 '24 16:03 joyeecheung

I think this is what people are already advocating here, but just in case and to be explicit: We have precedent for this in import(esm) (and import * as mod from 'some-package'), so I think the return of require(esm) should be the same, and that should facilitate referential equality.

JakobJingleheimer avatar Apr 14 '24 10:04 JakobJingleheimer

We have precedent for this in import(esm) (and import * as mod from 'some-package')

Not sure if I'm following - what's the precedent specifically?

And, as long as we do want to insert __esModule into the namespace, the result of import(esm) and require(esm) won't be reference equal anymore, unless we also insert it to the result of import(esm).

joyeecheung avatar Apr 14 '24 12:04 joyeecheung

unless we also insert it to the result of import(esm)

🤔 sure? No reason comes to mind to not do that.

JakobJingleheimer avatar Apr 14 '24 12:04 JakobJingleheimer

unless we also insert it to the result of import(esm)

🤔 sure? No reason comes to mind to not do that.

The reason not to is that it's not part of the JS modules spec and once shipped could never be removed from import(). If there a way to limit the scope, like it's only added for dynamic import() within CJS modules, then it might be somewhat less impactful?

Do people need require(foo) === await import(foo) anyway? What are they doing with that? Is it some kind of environment detection? Are they using modules in a map that have been imported different ways? I'm not sure why any generic module loading that supports JS modules wouldn't be using import() everywhere.

justinfagnani avatar Apr 14 '24 14:04 justinfagnani

Do people need require(foo) === await import(foo) anyway?

Actually I don't think it matter that much, because import(cjs) (returns a synthetic module namespace) is not reference equal to require(cjs) already, and no one complains AFAIK (though it might also have to do with that it's probably not too common to import(cjs)).

joyeecheung avatar Apr 14 '24 14:04 joyeecheung

Do people need require(foo) === await import(foo) anyway?

It's not for comparison; it facilitates avoiding the dual-package hazard (and thus can vastly simplify a package's internals):

const mod = require('some-package');

mod.foo++;
import * as mod from 'some-package';

mod.foo++;

In the above, assuming the package is done right, foo will be the same foo and thus be incremented twice (instead of two different foos each getting incremented once each). Absolutely no-one wants two different foo here.

JakobJingleheimer avatar Apr 14 '24 14:04 JakobJingleheimer

@JakobJingleheimer I don't think this use case needs reference equality of "the thing that gets returned by require() or import()", what it needs is live binding of what's inside those returned objects. The example still works even if mod.foo are done from re-exported getter/setters (solution proposed https://github.com/nodejs/node/pull/52166#issuecomment-2015415552), or if mod.foo comes from the prototype (the original solution that's still in this PR). Although I do think this suggests that we shouldn't do https://github.com/nodejs/node/pull/52166#issuecomment-2011020389 because then you'll be modifying the live binding of the facade, not the original module.

joyeecheung avatar Apr 14 '24 15:04 joyeecheung

import * as mod from 'some-package';

mod.foo++;

In the above, assuming the package is done right, foo will be the same foo and thus be incremented twice (instead of two different foos each getting incremented once each). Absolutely no-one wants two different foo here.

mod.foo++ would throw a TypeError because JS module exports are readonly. But the point about the underlying module instance stands - and is met by the proposed solutions here.

It's just that the example needs to be:

const mod = require('some-package');
mod.incrementFoo();
import * as mod from 'some-package';
mod.incrementFoo();

where mod.incrementFoo() mutates the same module instance in both cases.

justinfagnani avatar Apr 14 '24 15:04 justinfagnani

Right actually I don't think it matters for https://github.com/nodejs/node/pull/52166#issuecomment-2011020389 either since the module namespace is not mutable. And if the mutation is done via a method or on an exported object's properties, all the solutions proposed so far behave the same, they will just modify what's in the underlying module.

joyeecheung avatar Apr 14 '24 15:04 joyeecheung

If it's not a pointer, that'll be hell to manage. Yes we can, but surely a pointer is the most simple and desirable, no?

It's just that the example needs to be:

const mod = require('some-package'); mod.incrementFoo(); import * as mod from 'some-package'; mod.incrementFoo(); where mod.incrementFoo() mutates the same module instance in both cases

Ah, yes—fair point (I oversimplified). As long as they point to the same underlying state, great 😊 but hopefully there is as little in the middle as possible/necessary.

JakobJingleheimer avatar Apr 14 '24 16:04 JakobJingleheimer

Do people need require(foo) === await import(foo) anyway?

Actually I don't think it matter that much, because import(cjs) (returns a synthetic module namespace) is not reference equal to require(cjs) already, and no one complains AFAIK (though it might also have to do with that it's probably not too common to import(cjs)).

With CJS, this just has to be because of the semantic differences between require(cjs) and import(cjs). Hopefully that expectation is set and non-equality between require(esm) and import(esm) is natural and more-or-less assumed now, even though there isn't the same semantic difference.

justinfagnani avatar Apr 14 '24 17:04 justinfagnani

it's probably not too common to import(cjs))

I do this 😅 but only with legacy packages until I can replace them.

JakobJingleheimer avatar Apr 14 '24 19:04 JakobJingleheimer

Could you generate the new object from the namespace object, creating own enumerable getters for every export to support live bindings, and add the non-enumberable __esModule property to that?

Experimenting with the idea

Did this approach prove viable? What was the performance like?

andrewbranch avatar Apr 24 '24 16:04 andrewbranch

Could this not be implemented within module_wrap.cc to produce a require(esm)-specific namespace object that adds __esModule in a way that makes this transparent to the user?

rbuckton avatar Apr 24 '24 22:04 rbuckton

I did a benchmark with the proposed approaches:

  1. node_proto is what this PR currently does (make the namespace the prototype, and {__esModule: true} as the final object in the chain)
  2. node_desc is what https://github.com/nodejs/node/pull/52166#issuecomment-2015415552 proposed ( sort-of: I put the {__esModule: true} on the prototype)
  3. node_synthetic is a version that creates a synthetic module and re-exports everything using a native utility, with an added __esModule: true.
  4. node_synthetic_js is basically the same as 3 but does the re-export in JS using ModuleWrap

test-require-esm/load.cjs comes from the repo that I used to test loading 30+ high-impact ESM-only npm pacakages with representative usage patterns: https://github.com/joyeecheung/test-require-esm/blob/main/load.cjs

The numbers are basically the same.

Benchmark 1: ./node_proto --experimental-require-module ../test-require-esm/load.cjs
  Time (mean ± σ):     686.1 ms ±  13.8 ms    [User: 773.2 ms, System: 135.1 ms]
  Range (min … max):   668.6 ms … 705.9 ms    10 runs

Benchmark 2: ./node_desc --experimental-require-module ../test-require-esm/load.cjs
  Time (mean ± σ):     689.0 ms ±  13.4 ms    [User: 779.7 ms, System: 130.7 ms]
  Range (min … max):   671.4 ms … 713.3 ms    10 runs

Benchmark 3: ./node_synthetic --experimental-require-module ../test-require-esm/load.cjs
  Time (mean ± σ):     685.9 ms ±  12.2 ms    [User: 774.7 ms, System: 137.2 ms]
  Range (min … max):   672.3 ms … 703.4 ms    10 runs

Benchmark 4: ./node_synthetic_js --experimental-require-module ../test-require-esm/load.cjs
  Time (mean ± σ):     695.0 ms ±  14.3 ms    [User: 806.8 ms, System: 116.5 ms]
  Range (min … max):   674.9 ms … 723.0 ms    10 runs

Summary
  './node_synthetic --experimental-require-module ../test-require-esm/load.cjs' ran
    1.00 ± 0.03 times faster than './node_proto --experimental-require-module ../test-require-esm/load.cjs'
    1.00 ± 0.03 times faster than './node_desc --experimental-require-module ../test-require-esm/load.cjs'
    1.01 ± 0.03 times faster than './node_synthetic_js --experimental-require-module ../test-require-esm/load.cjs'

I also have a microbenchmark modified from https://github.com/nodejs/node/issues/52369 (but uses require(esm)) and the numbers are again not too different - maybe I could write a more formal microbenchmark to get more accurate numbers with statistical confidence, but at least I am not seeing anything significant impact even from unstable numbers.

Behavior wise I prefer 2-4 because the exports are spreadable/enumerable. 2 has the advantage that __esModule is not enumerable, so you won't see this weird bit when logging the required result, the disadvantage is that when you log it you see a getter, not the value..

joyeecheung avatar May 01 '24 15:05 joyeecheung

Could this not be implemented within module_wrap.cc to produce a require(esm)-specific namespace object that adds __esModule in a way that makes this transparent to the user?

There is currently not way to create a namespace object directly, it needs to come from a module, either a SourceTextModule or a SyntheticModule.

(Actually I now realize that there is no point chasing down the SyntheticModule path because that breaks live bindings..to preserve live bindings + use a namespace object it needs to be an export star in a SourceTextModule)

joyeecheung avatar May 01 '24 15:05 joyeecheung

I tried https://github.com/nodejs/node/pull/52166#issuecomment-2011020389 and it needs a bit of extra work - for modules with default exports it needs to add export { default } from 'original'. The number is again similar to others when testing https://github.com/joyeecheung/test-require-esm/blob/main/load.cjs

joyeecheung avatar May 02 '24 00:05 joyeecheung

I heard some comments from @jakebailey either about this being bad for TypeScript, or about not doing this being bad, but I didn't understand which of the two. Jake, could you re-explain it? 😅

nicolo-ribaudo avatar May 04 '24 07:05 nicolo-ribaudo