node icon indicating copy to clipboard operation
node copied to clipboard

Modules loaded through `--import` seem to use a separate ESM cache

Open timmolendijk opened this issue 2 years ago • 8 comments

Version

v18.8.0

Platform

Darwin Tims-MacBook-Pro.local 21.6.0 Darwin Kernel Version 21.6.0: Wed Aug 10 14:28:23 PDT 2022; root:xnu-8020.141.5~2/RELEASE_ARM64_T6000 arm64

Subsystem

ECMAScript modules

What steps will reproduce the bug?

When running the following command:

node --experimental-loader ./loader.mjs ./run.mjs

Given the following context:

// loader.mjs
import './shared-dep.mjs';
console.log('loaded loader.mjs');
// shared-dep.mjs
console.log('loaded shared-dep.mjs');
// run.mjs
import './loader.mjs';
import './shared-dep.mjs';
console.log('loaded run.mjs');

The output is as follows:

loaded shared-dep.mjs
loaded loader.mjs
loaded shared-dep.mjs
loaded loader.mjs
loaded run.mjs

Notice how the imports from --experimental-loader ./loader.mjs are reloaded once they are imported from node ./run.mjs, while the two times that shared-dep.mjs is imported as a result of the latter triggers only a single load. This looks to me like we are dealing with two ESM caches here.

How often does it reproduce? Is there a required condition?

Sharing modules between A & B when doing node --experimental-loader A B.

What is the expected behavior?

loaded shared-dep.mjs
loaded loader.mjs
loaded run.mjs

What do you see instead?

loaded shared-dep.mjs
loaded loader.mjs
loaded shared-dep.mjs
loaded loader.mjs
loaded run.mjs

Additional information

The two do share the same global.

timmolendijk avatar Sep 09 '22 19:09 timmolendijk

The two do share the same global.

Note that it is expected to change soon-ish, as the plan is to move the loaders to a separate thread. Do you expect them to share the same cache? If we move forward with the plan, loaders would definitely use their own cache.

That being said, it looks like the same happen with --import, which I find surprising:

mkdir repro && cd repro
echo 'import "./shared-dep.mjs";console.log("loaded loader.mjs");' > loader.mjs
echo 'console.log("loaded shared-dep.mjs");' > shared-dep.mjs
echo 'import "./loader.mjs";import "./shared-dep.mjs";console.log("loaded run.mjs");' > run.mjs
node --import ./loader.mjs run.mjs
cd .. && rm -r repro

/cc @nodejs/loaders @MoLow

aduh95 avatar Sep 10 '22 16:09 aduh95

Do you expect them to share the same cache?

My use case is that I am using the custom loader when running tests to enable module mocks. How this works is that the load hook returns a mock from a Map instance that therefore needs to be shared between test code and custom loader. Currently this Map instance effectively is not shared (and instead I end up with two instances) due to the separate ESM caches. So yes, I was expecting them to share the same cache, as the custom load hook is the only option I see for mocking module imports.

I currently work around the issue by assigning the Map to a property on global, but I am not surprised to learn that this work-around is bound to break soon.

timmolendijk avatar Sep 10 '22 17:09 timmolendijk

By the way: where I would expect separate ESM caches, but where they currently seem to share a single one, is when running node --test. The docs mention that each test module is run in a separate process. This makes me expect that each test module will be run in isolation and that each of them will receive a fresh environment. Would be very useful in a testing situation!

Unfortunately as far as I can see all test modules currently share the same ESM cache.

timmolendijk avatar Sep 10 '22 19:09 timmolendijk

@timmolendijk if you have a repro for running node --test and using the same ESM cache can you open a separate issue for it with the repro?

MoLow avatar Sep 11 '22 06:09 MoLow

When using node --test, each file is run in a different process, and therefore have different ESM caches and globals, I'm not sure what you mean. EDIT: Moshe is right, let's not derive the conversation, I'm hiding my comment as off-topic.

aduh95 avatar Sep 11 '22 09:09 aduh95

They definitely use different caches: there are 2 independent instances of ESMLoader, each of which instantiate their own Map. This is very much by design.

Indeed once they are separated by thread boundaries (https://github.com/nodejs/node/pull/44710) they will be further separated.

JakobJingleheimer avatar Sep 20 '22 18:09 JakobJingleheimer

@JakobJingleheimer that doesn't apply to --import though, right?

aduh95 avatar Sep 20 '22 21:09 aduh95

It does: --import uses/affects one of the ESMLoader instances, and when loaders are moved off-thread, any global mutations import might do will not affect the other.

JakobJingleheimer avatar Sep 20 '22 21:09 JakobJingleheimer

Interesting point @aduh95, I also would have expected --import to only apply to the main user loader like --require does, rather than executing in both the loader context and the main context. I guess conversely we also need to figure out if --require should apply to the loader worker with this separation as well.

guybedford avatar Sep 21 '22 17:09 guybedford

I guess conversely we also need to figure out if --require should apply to the loader worker with this separation as well.

Some related discussion (from August) in the OpenJS slack:

https://openjs-foundation.slack.com/archives/C01810KG1EF/p1659692532627829

Maël Nison 1:42 PM should --require execute within the loader thread? 1:46 PM (and if they do, do they execute before the loaders have run, or after? that becomes tricky if/when the loaders start to alter the cjs resolution) 2:15 PM also of note: at the moment --require executes before --loader. As a result, if the answer to my question is “--require is only for the app thread” (which is fine), then once loaders are migrated off threads they may be broken (because whatever was previously setup by the --require calls won’t be anymore) 2:18 PM so in that case (and only in that case), it’d be imo be beneficial to switch the evaluation order as early as possible, to lower the migration pain for loader authors (for instance, the PnP loader currently relies on the builtin modules being patched by an earlier --require call)

aduh95 2:33 PM Is it possible to switch the order? IIUC --require loads synchronously (that's relevant for async_hook users), loaders are async because they need to support ESM.

cspotcode 6:46 PM Making --require execute a second time in the loader thread is probably not ok for performance reasons? Cuz users don't have a way to opt-out

aduh95 6:51 PM It would also be very surprising to have it execute twice! Contrarily to the CJS loader system, altering the behavior of the ESM loader (monkey patching) would be considered a bug anyway, so it's definitely not a use case we want to support.

cspotcode 9:51 PM But! We should make sure that yarn's use-case is supported in some other way. I think the monkey patching is modifications to the FS API? Maybe some warning suppression, too?

Geoffrey Booth 10:49 PM well that leads into the idea of the customization api and providing more hooks for other things like customizing stack traces, repl actions, filesystem calls, etc 10:51 i think the virtual filesystem use case can be handled both within --loader (the first loader in a chain registers hooks to make the following loaders' fs calls use the virtualized fs) and --import, if it's also desired to make user app code use the virtualized fs

Geoffrey Booth 10:53 PM re load order, i think when we added --import it was discussed that a desired follow-up would be that load order follow argument order, so like --require 1.cjs --import 2.mjs --require 3.cjs --import 4.mjs be loaded in the numerical order, interleaving the cjs and esm dependencies. i feel like --loader shouldn't be interleaved like that, that any loaders should always happen first before any require/import flags despite argument order, but i can be persuaded otherwise.

aduh95 2 months ago like --require 1.cjs --import 2.mjs --require 3.cjs --import 4.mjs be loaded in the numerical order, That's technically not possible, folks should instead use --import 1.cjs --import 2.mjs --import 3.cjs --import 4.mjs to achieve this (because --import supports both CJS and ESM)

Geoffrey Booth 2 months ago ah, then we should probably call this out in the docs

Geoffrey Booth 2 months ago maybe even error on --require and --import used together, since the error could provide a simple remedy (just replace all your --requires with --import)

aduh95 2 months ago If we go the route of throwing an error, we should make sure that all the use-case of --require are covered by --import (looking at you async_hooks), otherwise it would be very frustrating for the users.

aduh95 2 months ago Also, we should keep in mind that either option may be defined in the env, so if a user wants to use --import and have a --require in their env, that's going to be annoying.

Maël Nison 1 month ago it’s quite important that --import and --require can be used together

Maël Nison 1 month ago for instance, Yarn currently does NODE_OPTIONS="--require ./.pnp.cjs"; it means our users wouldn’t be able to use --import

arcanis avatar Sep 21 '22 17:09 arcanis

Fixed by #44710 (not the OP, now that loaders are in a different thread, they definitely not share the user-land module cache. but --import is now using the correct cache)

aduh95 avatar Jun 27 '23 21:06 aduh95