modules icon indicating copy to clipboard operation
modules copied to clipboard

Open issues with require(esm)

Open hybrist opened this issue 6 years ago • 28 comments

This issue is meant to track concerns raised in the past about require(esm). I'm hoping to preempt having these discussions in January after we (or rather @weswigham) already invested more work into providing an implementation of require(esm).

Previous discussions:

From the 2019-10-23 meeting notes:

Wes’ points are valid. Except we should be decisive about require(ESM). Wes did good work. Now that work has stalled, we can’t say it will appear in 6 months. But it’s not spec compliant with ESM. Promises are specced to suspend main thread. Flattening promises would be non-spec compliant. So problem is lack of work and spec adherence. Top-level await makes this observable.

— @guybedford (emphasis mine)

From #308:

Even if require(esm) was provided today it becomes hazardous to use over time as async-to-eval modules get introduced. For example a JS module using top-level await or a WebAssembly module. This can happen in dependencies you don't control, so it's not easy to defend against this hazard.

For this reason I'm also skeptical that this feature could be provided in a safe way that guaranteed future compatibility.

— @robpalme

hybrist avatar Dec 10 '19 16:12 hybrist

Example 1: Transitive Dependency Uses TLA

// This code works perfectly fine - until a transitive dependency of `some.mjs` uses TLA.
// Then suddenly it breaks and it's afaict impossible to fix since there's nothing to wait for.
// myThing suddenly is undefined even though some.mjs didn't change.
const { myThing } = require('./some.mjs');
myThing();

hybrist avatar Dec 10 '19 16:12 hybrist

Example 2: Temporary Event Loop and Resources

The syncify in the existing prototype requires the introduction of a separate event loop.

See: https://github.com/weswigham/ecmascript-modules/commit/3e6a7687266d6551ecb8ea319d328046446bfc59

If any part of the loading pipeline, including custom loaders, creates resources attached to the event loop, they can prevent the event loop from terminating. Reversely, deref'd resources may appear to "hang" with timeouts. Also, if a long-running handle is created, it may still reference the old event loop (e.g. opening a file archive and keeping it open for future loads).

hybrist avatar Dec 10 '19 17:12 hybrist

Loader hooks are async so I expect code which performed require(esm) would break when run under tooling which injects loader hooks. This would be problematic for tooling as users would blame the tool for an issue that would be outside the control of the tool.

coreyfarrell avatar Dec 10 '19 17:12 coreyfarrell

@coreyfarrell no, loader hooks work fine with sync code. see https://docs.google.com/presentation/d/16XSS7P2RMUtpBA8iolaMV9MpLP_Yot8lAml1trbl38I/edit?usp=sharing for details on how that is achievable.

bmeck avatar Dec 10 '19 17:12 bmeck

@bmeck I think that comes with an important qualifier though, right? "Hopefully upcoming off-thread loader hooks work fine with sync code". Because the current dynamic instantiate hook isn't necessarily safe in this context afaict.

hybrist avatar Dec 10 '19 18:12 hybrist

@jkrems correct, the current loader hooks are not expected to survive in their state and I'd be unlikely to endorse on thread/shared heap for them currently.

bmeck avatar Dec 10 '19 18:12 bmeck

@robpalme 's concern about async-to-execute stuff entering via your dependencies isn't unique to require(esm) - TLA is observable in modules in the same way. It's no worse than (or doesn't need to be any worse than) TLA in that regard, however you feel about that.

weswigham avatar Dec 10 '19 19:12 weswigham

The syncify in the existing prototype requires the introduction of a separate event loop.

Yah, so does child_process.execSync. Your point? It's done elsewhere in the node core code. The "hazard" is that if a cross-loop event dependency was taken, there'd potentially be a deadlock. This is not unique to require(esm) - any async pattern with a form of "locks" can deadlock (in this case, which loop is actively executing essentially holds a lock on the execution). There's obviously at least two ways to avoid this, since the problems of async execution models are well known - preemtive rescheduling (restarting the parent loop even if the child still has content after awhile to try to guarantee liveness), or simply mindful implementation that doesn't contend on the lock (eg, so the algorthms syncified only use internal event deps) which is a kind of cooperative multitasking. This is implemented in core precisely so that that invariant can be upheld (as, in core, you'd need to do something silly like in the resolver, like wait on a promise that was created prior to starting resolution, which can be easily avoided, so long as you know to), so a deadlock does not occur, which is much nicer than the "preemptive" case (and much easier to explain the result of).

weswigham avatar Dec 10 '19 19:12 weswigham

But it’s not spec compliant with ESM. Promises are specced to suspend main thread. Flattening promises would be non-spec compliant. So problem is lack of work and spec adherence. Top-level await makes this observable.

I think the reverse. Top-level await's spec chicanery very explicitly makes "flattening" promises a thing. All the execution APIs are supposed to be async to handle TLA; but TLA is also specced, very explicitly, to skip all those extra event loop turns and promise resolutions in cases where the module doesn't actually use await, in order to maintain comparability with the current execution expectation that "if a module graph only contains sync code, its whole execution happens sync". This is a flattening of a promise, a .unwrapSync call, if you will. Given this is observable in the world with TLA, I don't see it being observable in require(esm) being an issue.

weswigham avatar Dec 10 '19 19:12 weswigham

Long story short, you point out something you find distasteful with require(esm), and I can probably point out how TLA in modules ends up requiring the runtime do the same thing. If anything, I feel vindicated by the current TLA implementation. If we could get a createFunction v8 API that supported setting the Async flag (which doesn't seem hard to make from the internals I looked at, but because I'm not familiar with v8, I'm unsure how it should get done), I totally think we could even support TLA in cjs with the same caveats it has in esm (sync code is immediately unwrapped, otherwise it's waited on and can deadlock on contended resources, just like TLA). (Though it begs the question of if you can have a sloppy mode Async-flagged function, since cjs modules are sloppy mode by default.)

weswigham avatar Dec 10 '19 19:12 weswigham

The specific spec invariant with require of TLA is that the event loop is being forked, with the main event loop effectively stalled while the ESM event loop runs separately.

When spawning a child process this is being done using the process boundaries.

But within the same memory and realm this violates a number of spec timing properties of the language all at once.

JavaScript is not designed to support two event loops running, where which event loop code is loaded through, when, and when they will be locked / running is non determinate.

On Tue, Dec 10, 2019 at 14:37 Wesley Wigham [email protected] wrote:

Long story short, you point out something you find distasteful with require(esm), and I can probably point out how TLA in modules ends up requiring the runtime do the same thing. If anything, I feel vindicated by the current TLA implementation. If we could get a createFunction v8 API that supported setting the Async flag (which doesn't seem hard to make, but because I'm not familiar with v8, I'm unsure how it should get done), I totally think we could even support TLA in cjs with the same caveats it has in esm (sync code is immediately unwrapped, otherwise it's waited on and can deadlock on contended resources, just like TLA).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/modules/issues/454?email_source=notifications&email_token=AAESFSW2KBAFN6PHBR6STATQX7VV3A5CNFSM4JZBQMV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGQSQVA#issuecomment-564209748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAESFSXSTI3KTOAJVCSEFDTQX7VV3ANCNFSM4JZBQMVQ .

guybedford avatar Dec 10 '19 19:12 guybedford

TLA is observable in modules in the same way. It's no worse than (or doesn't need to be any worse than) TLA in that regard, however you feel about that.

Are you talking about a previous draft of TLA? I'm fairly certain that the ESM equivalent of Example 1 is perfectly safe and doesn't break if a transitive dependency uses TLA.

import { myThing } from './some.mjs';
// We definitely know that myThing is initialized, even if some transitive dep of some.mjs
// is using top level await.
myThing();

Can you be more specific where TLA in a transitive dep is observable in the same way in ESM?

hybrist avatar Dec 10 '19 19:12 hybrist

@jkrems I'm talking about how If I have

// @filename: root.js
import {thing} from "./setUpThing.js";
import "some-library";
import {consume} from "./consumeThing.js";
consume(thing);
// @filename: setUpThing.js
export let thing = true;
function removeThing() {
    thing = undefined;
}
Promise.resolve().then(removeThing); // schedule `thing` to be immediately removed because reasons - must be used synchronously
// @filename: consumeThing.js
export function consume(thing) {
   console.log(thing ? "all sync deps" : "async dep somewhere");
}

the log in consume can detect if some-library contains, somewhere in it's dependency tree, TLA, as if it did, the event loop turned between the first import and the third import, thereby executing the removeThing function. The current TLA allows interleaved execution once await is in play (and, as a result, allows you to deadlock yourself if you await on something that can never execute - this is known).

With respect to

const { myThing } = require('./some.mjs');
myThing();

given how TLA works in ESM, I think it's now reasonable to wait on any promises on some.mjs's execution if they can't be immediately unwrapped - actually, if I'm to replicate how the current TLA works in esm, I don't think I even need a secondary event loop anymore (as that was moreso to prevent any interleaved execution, which the current TLA proposal actually allows once TLA is in use) - I can just unwrap immediately, if possible, and, if not, spin the main event loop until the promise we're waiting on resolves or rejects (then finally yield back to the caller of require with the value or throw). Nothing worse should happen with require(esm) than what's possible with TLA.

weswigham avatar Dec 10 '19 20:12 weswigham

Okay, I'd prefer if we focus on the relatively clear usability example that doesn't require any special code (other than "some dependency uses TLA in any way whatsoever") to break. I don't think it's fair to put your example on the same level here because it requires very specific coding patterns.

So, looking at your explanation: It sounds like what you're actually suggesting as the solution is that during the execution of require, we would allow arbitrary async code execution by running the event loop until a certain condition is met (e.g. promise for the load job is settled). Which means that code like this is no longer safe:

const x = fs.createReadStream('./x');
require('./foo'); // x may start emitting error events
x.on('error', console.error);

That's definitely more of an edge case compared to "Example 1" since it requires that require is running interspersed with other application code. But I'm not sure I would support a node version that removes the old guarantee that there's no ticks during require.

hybrist avatar Dec 10 '19 21:12 hybrist

Which means that code like this is no longer safe:

TLA does the same for esm. That's the tradeoff TLA makes:

const x = fs.createReadStream('./x');
await import('./foo'); // x may start emitting error events
x.on('error', console.error);

weswigham avatar Dec 10 '19 21:12 weswigham

The moral here is that using TLA is observable by people who require/import you, as it yields back to the event loop. That's really it. That's true in esm, so it's fine for it to be true for require(esm), too.

weswigham avatar Dec 10 '19 21:12 weswigham

TLA does the same for esm. That's the tradeoff TLA makes:

That code snippet isn't the same. Your code snippet has an explicit await. The user knows 100% that having an async boundary like await in their code causes ticks. That's nothing to do with TLA. But your change means that the following code may cause a tick now which is 100% not what a user will expect:

const x = fs.createReadStream('./x');
someFunctionThatLooksSync(); // x may start emitting error events
x.on('error', console.error);

Because someFunctionThatLooksSync may use require, somewhere in the call chain. And if that require hits TLA, suddenly this synchronous function call will cause ticks of the event loop. This is absolutely new behavior. Not only new in node but new in any runtime I'm aware of. And no, outside of require(esm), it does not happen. I don't agree that we can blame this on the existence of TLA or that TLA somehow inherently implies that synchronous statements may block on async work / cause ticks. Because it doesn't in node today and it doesn't in the browser today.

hybrist avatar Dec 10 '19 21:12 hybrist

As I demonstrated here, you do not need an await in any code you control to witness the change.

weswigham avatar Dec 10 '19 21:12 weswigham

As I demonstrated here, you do not need an await in any code you control to witness the change.

What you demonstrate with that snippet is that TLA may delay file execution by ticks. That's part of the spec. This will be part of every runtime and - to a degree - everybody "knew" that the timing of file execution wouldn't be always in one tick. That's a different question.

What I'm complaining about is that an arbitrary function call may cause execution to suspend and the event loop to run in the middle of statement evaluation, not even just in the middle of a file. That's the same feature as supporting a synchronous blockUntil(somePromise) function. I would object to that feature for the same reason. In JavaScript, code runs to completion unless there's a syntactic marker like yield or await.

hybrist avatar Dec 10 '19 21:12 hybrist

To clarify on "may cause execution to suspend and the event loop to run in the middle of statement evaluation", here's an example of code that's now suddenly unsafe because it becomes relevant in which order function arguments get evaluated:

attachLoggingErrorListener(
  fs.createReadStream('./x'),
  // calling a function in the following argument isn't safe!
  // someFunctionThatLooksSync may cause ticks!
  someFunctionThatLooksSync()
);

hybrist avatar Dec 10 '19 21:12 hybrist

Atomics.wait also exists and blocks... This isn't even the only function that can do that.

weswigham avatar Dec 10 '19 22:12 weswigham

And the overwhelmingly common case is that no event loop turns happen at all. It's only if they're required (because a module requires asynchronous execution) that it occurs.

weswigham avatar Dec 10 '19 22:12 weswigham

I've opened https://github.com/nodejs/node/pull/30891 so people more familiar with how node's testing infrastructure and stuff is set up can help.

weswigham avatar Dec 10 '19 22:12 weswigham

Atomics.wait also exists and blocks... This isn't even the only function that can do that.

That's like saying "while (true) blocks as well!". These are not apples to oranges comparisons. Atomics.wait will not suddenly fire data events on a stream or cause the process to exit on error events. So far you haven't mentioned any prior/existing API that would cause this behavior. And yes - maybe in most cases the event loop wouldn't turn. That's fair. But if an ES module 10 layers down the import graph does use TLA, we're back where we started. I don't think designing a system with the blanket assumption that some feature just won't get used is a viable path.

This would effectively add an API for synchronous await to node. I personally don't think node should add an API for synchronous await. There's a reason that await of promises requires a keyword in JavaScript.

hybrist avatar Dec 10 '19 22:12 hybrist

I don't think designing a system with the blanket assumption that some feature just won't get used is a viable path.

I have a suggestion.

Currently the core problem with missing require(esm) is that it's not possible to port to ESM from the "bottom up" but only from the "top down."

For example:

// x.cjs
const {y} = require('./y.cjs');
console.log("x" + y);

// y.cjs
const {z} = require('./z.cjs');
module.exports.y = "y" + z;

// z.cjs
module.exports.z = "z";

In this example, without require(esm), you can't port z to ESM, because then y would have no way to synchronously export z to x. To port to ESM, you have to port all of your dependents (and all of their dependents) to ESM first.

So, hypothetically, what if, when using require(esm) and TLA was encountered, require would throw, with a trace pointing to from x to y to z, the module that used TLA?

That way, z could be ported to ESM without porting y or x, but you wouldn't be able to use TLA if any of your dependents (or their dependents) were CJS.

In other words: instead of making ESM be the major break, which would effectively prevent popular libraries from porting from CJS to ESM, make TLA be the major break.

dfabulich avatar Aug 07 '20 20:08 dfabulich

In other words: instead of making ESM be the major break, which would effectively prevent popular libraries from porting from CJS to ESM, make TLA be the major break.

It's not just TLA. It's also a return to the 100% synchronous and main thread blocking loader. The loader is currently implemented with promises, allowing us to add things like parsing in the background while executing other JS code. If we want to allow synchronous access to loaded ESM, we'd have a few options:

  1. import(esm) may load and run the imported code synchronously and artificially tick before returning. This would break some major assumptions about order of execution, so likely not a viable path.
  2. import(esm) artificially ticks and then blocks the main thread until all imported code has been loaded and ran. This may be viable but we'd still regress to CJS-levels of thread blocking.
  3. import(esm) is fully async and require(esm) is fully sync. This means we need de-facto two implementations of the module loader with some gnarly code to make sure they don't conflict with each other when a synchronous load happens while an asynchronous load is in progress.

I think this moves us towards ESM as just syntactic sugar for CJS which I'd consider a missed opportunity for an ESM-first future. I think we may be able to still get somewhat similar execution behavior to the browser - maybe? But it smells like something that risks deadlocks and/or weird timing issues. E.g. if a dependency has TLA but was already executed previously - could it be successfully imported synchronously?

hybrist avatar Aug 07 '20 22:08 hybrist

I think this moves us towards ESM as just syntactic sugar for CJS which I'd consider a missed opportunity for an ESM-first future.

There will be no ESM-first future until the “bottom up” problem is solved.

Option nodejs/modules#3 sounds to me like Wes’ implementation, and I, for one, approve of it. TLA is the only real problem with it. If I have to give up on require(tla) in order to access require(esm), that’s absolutely worth the trade-off, because it solves the "bottom up" problem for ESM.

dfabulich avatar Aug 08 '20 04:08 dfabulich

i am wondering why we need to solve the problem so complex we could maybe simply make the cjs loading a async process it self so CJS it self as it is sync will not care to execute inside a async process we could throw some process.nexttick magic into that to free some execution time for other stuff to execute. as every CJS part is only aware of its CJS loading it will not conflict it is like

createRequire()

we are not forced to start in a CJS context at all what blocks us from simply only offering the ESM Context and force createRequire use???

my vote is clear for breaking any existing cjs codebase without fear as we do break ESM at present without fear updating the CJS packages will be a no brainer with existing tooling.

out of my few years 8+ years with the ESM loader spec and trying to integrate existing cjs packages into browsers and diffrent JS Environments, DENO, GraalJS (other java js runtimes), teavm, nodejs (including forks like the chakra core based nodejs-mobile fork that runs on android and ios).

I did always transpil my dependencies up via rollup to ESM when that was not possible i did write the code to convert the packages to esm like "lebab" but there is also ongoing google efforts to create tooling to transpil stuff up not down like babel does.

i did even manage to create working electron apps with minimum cjs only for the init.cjs that is needed because they are using a hacked nodejs fork with adjustments and they need to algin the both libuv loops for the browser and the nodejs stuff.

the so called "ready" event when both libs are fully inited and loaded

frank-dspeed avatar Feb 12 '22 09:02 frank-dspeed

I recently ran into this problem myself first a while back with something related to storybook and more recently trying to write a plugin for a 3rd party library that runs in node (not in the browser) and performs a dynamic import within the plugin / callback

after doing a bit of googling I found a few stackoverflow threads about others asking the same question

  • https://stackoverflow.com/questions/51069002/convert-import-to-synchronous
  • https://stackoverflow.com/questions/72635032/how-to-wait-for-js-promises-to-complete-without-using-then-or-await
  • https://github.com/Microsoft/TypeScript/issues/19495

Some workarounds I found that might work.

  • tsx seems to work using the createRequire method However I think it might be cheating a bit by building everything into CJS via esbuild first as a workaround.
  • deasync I've seen comments to avoid it's use since it's using hacks to workaround the problem.
  • synckit This looks fairly promising since it mentions it doesn't use native bindings or node-gyp

Jiti

Edit - Another workaround (which unbuild uses for loading it's config files) Is to do the dynamic import via jiti

We can call run.cjs directly from node, jiti is just being referenced as a library here

// run.cjs
const jiti = require('jiti')
function test1() {
  const rootdir = process.cwd()
  const _require = jiti(rootdir, { interopDefault: true, esmResolve: true })
  const mod = _require('./testmod')
  mod.testfunc()
}
test1()

// testmod.mjs
export function testfunc() {
  console.log('testmod')
}

Hecatron avatar Feb 07 '23 14:02 Hecatron

Ever since the ES moduled became officially supported by Node.js, I wondered why there is no synchronous dynamic import, i.e a special function importSync() that does exactly what import() does, except that it is synchronous. This would be analogous to fs.*Sync() variants of file system utilities, I see no harm in it.

Server side software (services) do not benefit from asynchronous loading of program files, it's an unnecessary complication that shouldn't have been forced on us. The asynchronous nature of program loading is useful in browsers, for which programs are remote resources that must be downloaded and executed. Whenever someone tries to use ES modules from CommonJS modules or to load files with computed names from ES modules, they are exposed to this asynchronous nature. In service code, server side 🙃

acarstoiu avatar Jul 29 '23 18:07 acarstoiu