modules
modules copied to clipboard
import(cjs) with query strings has odd behavior
Hello, module team. May I get help with the way to load packages without cache? I have seen a bit of odd behavior in import(cjs).
Copied from https://github.com/nodejs/node/issues/29812
- Version: 12.11.1, 13.0.1
- Platform: Windows 10
- Subsystem: ES modules
From https://github.com/eslint/eslint/pull/12333
Repro https://github.com/mysticatea/import-cjs-issue
Description
I tried to import packages without import cache. From the document, it looks like I should use query strings.
await import("x-esm") //→ Found as expected
await import("./node_modules/x-esm/index.js?q=0") //→ Found and `x-esm` re-ran as expected
await import("./node_modules/x-esm/index.js?q=1") //→ Found and `x-esm` re-ran as expected
x-esm is an ES module package. It worked as expected.
const cjs = await import("x-cjs") //→ Found as expected
const cjs0 = await import("./node_modules/x-cjs/index.js?q=0") //→ Found but `x-cjs` didn't re-run
const cjs1 = await import("./node_modules/x-cjs/index.js?q=1") //→ Found but `x-cjs` didn't re-run
console.log(cjs === cjs0, cjs === cjs1, cjs0 === cjs1) //→ all are false but `x-cjs` didn't run three times
x-cjs is a CJS package. The result was odd. The console.log() in x-cjs package ran only one time, but the returned values are different for each query string.
I found the entry of x-cjs in require.cache. However, the cache entry is odd as well. It's different from require("x-cjs"), the entry doesn't have parent property and the module.children of test.js is still empty.
Anyway, I tried to remove the cache entry.
console.log("---- remove 'require.cache' ----")
delete require.cache[require.resolve("x-cjs")]
await import("x-cjs") //→ Found but `x-cjs` didn't re-run
await import("./node_modules/x-cjs/index.js?q=0") //→ Found but `x-cjs` didn't re-run
await import("./node_modules/x-cjs/index.js?q=1") //→ Found but `x-cjs` didn't re-run
await import("./node_modules/x-cjs/index.js?q=2") //→ Found and `x-cjs` re-ran as expected
await import("./node_modules/x-cjs/index.js?q=3") //→ Found but `x-cjs` didn't re-run
Cryptic. I guess this behavior is:
import(cjs)has cache apart fromrequire.cache.- The
import(cjs)cache is created fromrequire.cache. - It runs CJS package only if
require.cacheentry was not found. - The
import(cjs)cache is not removed even ifrequire.cacheentry deleted.
Therefore, I have to do the following steps if I want to import packages without cache.
- Find the main file of the package because I cannot put query strings to the package name.
const url = "file:" + require.resolve(packageName) + uniqueQueryString; - Import it.
const ret = await import(url); - Delete
require.cacheentry.delete require.cache[require.resolve(packageName)];
Questions
- Is it intentional behavior that
import(cjs)creates incompleterequire.cacheentries? - If yes, is it intentional behavior that
import(cjs)with query strings returns different objects for the same CJS package?
I'm guessing that import(cjs) should not create any require.cache entries, and import(cjs) with query strings re-runs CJS packages as same as ES packages.
Thanks for running those experiments and letting us know about your findings!
x-cjsis a CJS package. The result was odd. Theconsole.log()inx-cjspackage ran only one time, but the returned values are different for each query string.
Can you try to compare .default on all of these? That value should be identical and I assume it is what counts for users.
I guess this behavior is: [...]
Yes, import() has its own cache (the "module map" in ES module speak) that is based on URLs instead of file paths. The algorithm is roughly:
// Load module for a URL not yet in the module map
function loadModule(url, referrerUrl) {
if (isCommonJS(url)) {
// - This is where that fresh namespace object comes from (top of my post).
// - The toFileSystemPath is what drops the query string.
// - This code doesn't care if the referrerUrl is a CJS file, exists in the require.cache, etc.
const require = createRequire(toFileSystemPath(referrerUrl);
return { default: require(toFileSystemPath(url)) };
}
// continue loading standard JS modules
}
This also why technically the require.cache entries aren't incomplete. There's two completely independent module systems at work. require.cache only knows about the require system. And from that perspective, there isn't any parent. And there's no require-system children either when the module used import(). require.cache generally has no knowledge of what goes on in the ES module system and that includes import() inside of CommonJS files.
I'm guessing that
import(cjs)should not create anyrequire.cacheentries, andimport(cjs)with query strings re-runs CJS packages as same as ES packages.
The ES module system (import) doesn't support CommonJS. The only thing it knows to do is to forward to the require module system. So "one file path is one CJS module instance" and "each CJS module appears in require.cache" is things that it can't really influence. What we could do is to make require.cache invalidation a thing when we can tell that the URL doesn't match the URL originally associated with a CJS module. But I'm not sure there would be wide support for such a feature.
everything here seems to be working as intended. its just weird and complex because two separate module systems with separate caches are being welded together.
In the example above, I’d expect q=3 to be reran - a different query string is a different URL, and that one hadnt already been cached. Everything else looks right to me.
@ljharb its a new esm module, but it hits the existing require cache.
Right, but in the OP, q=3 was never evaluated in the require cache.
when ?q=2 runs, it puts a new entry into require.cache. ?q=3 hits that cache entry.
How? Query strings aren’t/shouldn’t be treated specially; a !== URL is a different URL.
How? Query strings aren’t/shouldn’t be treated specially; a !== URL is a different URL.
Query strings don't exist in file system paths and the require system only sees the file system path which is identical. So it returns the same, existing module.
Then if q=1 and q=2 were already in the cache - and didn’t rerun - why wasn’t q=3 already in the same cache?
iow I’d expect either “all query strings are normalized out”, in which case, that module should only ever run once - or, “query strings are preserved”, in which case a different query string should always rerun the first time.
@ljharb query strings are always preserved in esm, and always normalized out in cjs. In this case, each ?q=x is a new esm module, but that new esm module may wrap an existing cjs module.
So if q=0 and q=1 are distinct ESM modules (ran twice in the above example; cached the second time), then i would expect q=2 to be one and also q=3 when using import. But the above example shows 2 re-evaluating and 3 not, which seems inconsistent to me.
q=0, q=1, q=2, q=3 are all distinct es modules. "re-evaluating" in this case refers to whether the cjs module they wrap reruns.
Deleting the require cache, thus, should either have no effect, or should have an effect the first time. In the above example, it seems to have an effect the third time but not the first two or the fourth.
The full example is:
const cjs = await import("x-cjs") //→ Found as expected
const cjs0 = await import("./node_modules/x-cjs/index.js?q=0") //→ Found but `x-cjs` didn't re-run
const cjs1 = await import("./node_modules/x-cjs/index.js?q=1") //→ Found but `x-cjs` didn't re-run
// At this point:
// * require.cache[x-cjs/index.js] is present
// * moduleMap { 'x-cjs/index.js', 'x-cjs/index.js?q=0', 'x-cjs/index.js?q=1' } are present
console.log("---- remove 'require.cache' ----")
delete require.cache[require.resolve("x-cjs")]
// At this point:
// * require.cache[x-cjs/index.js] is *not* present
// * moduleMap { 'x-cjs/index.js', 'x-cjs/index.js?q=0', 'x-cjs/index.js?q=1' } are still present and point to previous CommonJS execution
await import("x-cjs") //→ Found but `x-cjs` didn't re-run
await import("./node_modules/x-cjs/index.js?q=0") //→ Found but `x-cjs` didn't re-run
await import("./node_modules/x-cjs/index.js?q=1") //→ Found but `x-cjs` didn't re-run
// up to this point, we've been hitting the module map and got the existing instances
// but now, we start missing the module map and run into the previously emptied require cache
await import("./node_modules/x-cjs/index.js?q=2") //→ Found and `x-cjs` re-ran as expected
await import("./node_modules/x-cjs/index.js?q=3") //→ Found but `x-cjs` didn't re-run
// At this point:
// * require.cache[x-cjs/index.js] is present but points to the new execution
// * moduleMap { 'x-cjs/index.js', 'x-cjs/index.js?q=0', 'x-cjs/index.js?q=1' } are still present and point to first CommonJS execution
// * moduleMap { 'x-cjs/index.js?q=2', 'x-cjs/index.js?q=3' } are present and point to second CommonJS execution
ah, thank you. I understand how it’s working, but it seems like it’s not working how it should.
Why would deleting the require cache have any effect at all on ESM modules? The two caches should be unrelated (if not kept in sync).
simplified example: https://engine262.js.org/#gist=31db83a9a28067300fccedc80b9e7725
Why would deleting the require cache have any effect at all on ESM modules? The two caches should be unrelated (if not kept in sync).
It only has an effect on subsequent misses in the ESM cache. Deleting the require cache had no effect on the existing ESM modules or the module map. It's only when ESM has a cache miss and asks for a "new" URL from require that it gets affected by the disappearing require.cache entries.
Stepping aside for a second, even with query strings only the entry point will actually be reevaluated (because the other specifiers are still unchanged).
You probably want to use something that creates a fresh import of the module graph e.g.:
import vm from 'vm';
async function defaultLinker(specifier: string, module: vm.Module) {
// the default (or from current loader) node linking algorithm
// hope that gets exposed at some point
return new vm.SourceTextModule(...);
}
async function importFresh(file: URL) {
const entry = new vm.SourceTextModule(fs.readFileSync(file, 'utf8'), {
identifier: file.href,
});
await entry.link(defaultLinker);
await entry.evaluate();
return entry.namespace;
}
This means you can evaluate the whole graph, also thanks to the way vm.Module means you can even cache certain specifiers for speed if you want (e.g. cache ones in node_modules).
Thank you for elaborating!
I learned that this is correct behavior, so likely I need the three steps for fresh importing:
// Find the main file of the package because I cannot put query strings to the package name.
const url = "file:" + require.resolve(packageName) + uniqueQueryString;
// Import it.
const ret = await import(url);
// Delete require.cache entry.
delete require.cache[require.resolve(packageName)];
@Jamesernator Thank you for your suggestion. The SourceTextModule class looks a good alternative. However, the class looks to not support a mix of modules and scripts (the linker accepts only modules). Is there a way to mix of those?
you have to create wrappers for anything that isn't a module. the vm.SyntheticModule class can help you with that.
@mysticatea BTW, if it helps, I've used a ?dev approach on my ESM-based static site, which I am happy to give more input on if that helps.
Just try those with the network panel open in the console:
- https://smotaal.io
- https://smotaal.io?dev
It boils down to the right dynamic import, and until top-level await lands, it is better practice imho to think of deps as clear partitions that are awaited. The gap that CJS would need to close to be portable (ie not bundled).
Sorry no CJS input here — I'm having seasonal allergies :)
@devsnek re vm.‹module stuff› what is the un/flagging roadmap there? and where do these discussions happen?
I'm quite interested in seeing how they play out with electron et al.
@SMotaal sometime after esm is stable.