isolated-vm icon indicating copy to clipboard operation
isolated-vm copied to clipboard

Returning a previously imported module whose status is 'linking' is rejected because of Linker mismatch

Open d3x0r opened this issue 11 months ago • 18 comments

(Sorry for the non-brief title, I don't know how else to say that)

The check in module_handle (link following) I think could be better... strictly comparing linker; I don't seem to be able to control which linker is being used... and each import actually gets its own linker... So this is the line of code (really I just commented out the throw and ran, and all was well, but left a log in there to see when it happens)

https://github.com/laverdet/isolated-vm/blob/main/src/module/module_handle.cc#L197

This is sort of the tree that is being evaluated... but since it's async when things get logged vs when they get done is a little skewed; but basically I implemented 'if this is already been loaded, return the module' and the module that gets returned is still linking, it's not resolved, but the linker would have been associated with shell.js (maybe sack.vfs/text, which is a source file), and then later when another parallel script also loads the same module, it ends up with a linker associated with command.js and the sack.vfs/text result is the linker of shell.js and the above 'linking' check of info->linker == this is ok... it's in the same tree - it really should all be using the linker of startup.mjs I would have thought....

  • startup.mjs imports
    • shell.js imports
      • sack.vfs/text (text.mjs)
      • command.js imports
        • sack.vfs/text (text.mjs) - Module is currently being linked by another linker
        • filter_base.js
    • strip_newlines.js imports
      • filter_base.js

filter_base.js can also trigger the link error; although I did move from doing compileModuleSync instantiate evaluateSync to all async methods, and that helped some; it was originally triggering with a simpler scenario, and I had to re-add another script to trigger the warning message (not exception in my current test build)

d3x0r avatar Feb 07 '25 03:02 d3x0r

this gist has several simple files that demonstrate the exception. https://gist.github.com/d3x0r/69dcb7551175899aadbba67f06776ab5

There's a Download ZIP option, which if you extract that zip, which has a folder number in it in 'isolated-vm/tests' then node test.mjs to run the test...

This chain of imports demonstrates the exception...

instantiate failed: ab.js Module is currently being linked by another linker Error: Module is currently being linked by another linker 

and several others.

d3x0r avatar Feb 07 '25 05:02 d3x0r

What's the use-case for dynamically dispatching module source from within the isolate? Generally what I've seen is that most consumers of isolated-vm know the source that will be executed by the time the isolate or context is created.

The module loader code is tricky since the underlying operation in v8 is a synchronous function, so for each outstanding linker invocation there needs to be a dedicated thread. There's a big risk of deadlock and the code there didn't have this case in mind.

laverdet avatar Feb 11 '25 06:02 laverdet

isn't source always used? strings are easy enough to copy out - I don't think that's where the issue arises.

It's because the message that causes the code to be executed is sent to the isolated-vm's handler for messages; one of which is 'run' which is given the source already loaded from some other location.

The code was also written with node's VM module; and the require I implemented for it was async; but the code was able to be executed in the module with an async Function... but when I do that import ...from "" and export doesn't work; so the source ends up being handed back out; yes I could catch the message as it's relayed to the isolated-vm;

and it ended up hard-coded in the example because where that got read was far beyond the scope of the sandboxInit/Prerun... the rest of the imports are just in the import chain of that fragment...

these are all static imports too - not import(); import() seems to lose the exception that causes a module to fail instead only emitting a not supported error....

The path is all async - so the original invocation of the script fragment returns basically immediately, freeing up that thread.

d3x0r avatar Feb 11 '25 07:02 d3x0r

I'm still not sure I really follow what you're saying.

To provide more context on what I was saying-- you should be treating the isolate as a hostile environment which can theoretically modify itself in ways which shouldn't be possible in JavaScript. So if you're directly issuing commands to isolated-vm from within itself then you have weakened the security of the entire container.

The import graph should be constructed, in its entirety, without instructions from the isolate.

laverdet avatar Feb 11 '25 18:02 laverdet

I can articulate a use a case for compiling and instantiating modules dynamically.

I'm a fair way in to a project that lets users persist and interface with objects of various classes through code; you write programs to interface with the objects interactively, like in a repl. Generally, a submitted script is executed entirely within in an isolate.

Since manipulating objects through code is a very general paradigm, the potential number of classes you might want to be able to instantiate as a user is huge. However, since every submitted script uses its own isolate, I'm expecting the compilation and instantiation of every module each time to become a bottleneck.

I've attempted to solve this by only instantiating modules that I can derive are required, and lazy loading the rest. E.g., a user script fetches a dictionary that holds some objects of any kind. You execute the script and find that one of the objects is a Foo. Now the set of modules that allow Foo to be imported has to be instantiated.

At this point, I've attempted let the isolate call an externalized function to read the module source, let the isolate compile and instantiate it, and then give control back the isolate. However, at the compilation step the process hangs.

Below, the lazy loader that is initialized is an external module exporting a function that can be called from within the isolate. The function calls the callback passed during initialization.

initLazyLoader(async (specifier: string, referrerUrl: string) => {

	// importing `specifier` allows the symbol to be lazy loaded to be accessed
	const modulePromise = isolate.compileModule(`import "${specifier}";`); // the process hangs here

	// `modules` is a cache to prevent instantiating modules mutiple times during construction of import graph
	modules.set(specifier, modulePromise);

	const module = await modulePromise;

	await module.instantiate(context, (specifier, referrer) => (
		_import(specifier, referrer, modules)
	));

	await module.evaluate({
		timeout: 5e3,
		promise: true,
	});
});

Is there any way to accomplish the desired outcome of adding an additional module during runtime in isolated-vm? If not, would you recommend I create a snapshot of all modules ready to go and start everything from there, or perhaps something else?

pieterjanv avatar Feb 11 '25 21:02 pieterjanv

I'm still not sure I really follow what you're saying.

To provide more context on what I was saying-- you should be treating the isolate as a hostile environment which can theoretically modify itself in ways which shouldn't be possible in JavaScript. So if you're directly issuing commands to isolated-vm from within itself then you have weakened the security of the entire container.

The import graph should be constructed, in its entirety, without instructions from the isolate.

'which can modify itself' 'but not with accepted ways' because the accepted ways weaken something?

I'm not a fan of build systems, because then you get one static blob that can't be changed.

https://www.d3x0r.org/dekware/index2.html This is somewhat what the project is... This was originally just C, with my own language which itself is Turing complete, but mostly is hooks into plugins. Plugins might be a deck of cards, a webserver, ... whatever. Things are 'Entities'... entities that are able to perform actions have a 'Sentience' which maintains an execution state for the entity; methods/macros can be defined on entities; and when executed, depending on whether the entity is awake(sentient) or not will determine where the script gets run (in the entities own sentience or in the sentience that called the command (the user probably) ). This came from a model to create MUDs (and the player characters and monsters in the world would be sentient sorts of things where swords, rocks, etc would be just entities; which if they have any special methods are done by the thing interacting with them, rather than being able to do anything themselves - well One could make a sentient sword that taunted the user I suppose...

https://github.com/d3x0r/dekcore.js?tab=readme-ov-file This - there's a high level interface language with the UI; but mostly scripts for objects are just JS now. An entity can be anything, and an awake entity (one that can perform code itself, and might have timers or other events from the world (or to the world) though COM ports, network connections, higher level things like HTTP requests and websocket connections...

/create service
/wake service
/tell service 'import "service.js"'

or maybe just tell the service the script that is service.js ... or give it a startup.

I plan on implementing logins and accounts so users can create/interface with the systems; I actually have up-to connecting to (The root operator object is called MOOSE (Master operator of system entities). HTTP-Moose receives websocket connections, and creates a entity for the connection, and wakes a handler for it in a VM (which is itself in a worker thread), and the commands are handled by the VM, which sends commands to the main process to do various operations, and get back results - the isolates are so far removed from the actual tracking of objects/entities in the system... I forwarded a lot of functions to the VM for Node; which I ended up getting crossed code where 'if( promise instanceof Promise )' wouldn't work, and I had to start also checking 'promise.constructor.name === "Promise"' because it was a promise, but a promise from outside the VM... this model (isolated-vm) makes it much more difficult to share such things and I've cleaned up some of that...

But - turns out it's just an operating platform, which runs arbitrary scripts.... One of the early projects that motivated the new dekware implementation was a bunch of services like Firewall (enable/disable connections to service), login service, service tracking core, chat service, group chat service, support services,...; all really somewhat related, but may or may not run in a single location; so not all things will need to be loaded for a secondary login node, which would connect back to the core and wouldn't nessecarily need chat interfaces loaded... which themselves are probably load balanced to other nodes.

I realized I was ending up really putting all the computation on the server; so I've also been making a symmetric sort of sentience hosting as a webworker, which would run in a users own browser for the objects they create;

So - module/dynamic loading support has improved with time; so webworkers can issue import; (though there is some small conflict trying to load packages, because the browser requires an absolute or relative path prefix so import "whatever" which can be done with node, can't be done from a browser ; but the other relative scripts that provides optional extensions to interface with various things in the system work pretty OK...

Node cannot give the correct context when loading .node addon... it's always the outside codes require() that ends up getting run; so already there ends up leaking the worker into the VM...

d3x0r avatar Feb 11 '25 22:02 d3x0r

I'm expecting the compilation and instantiation of every module each time to become a bottleneck

Is this code authored by you or another developer that you don't trust? Websites on the internet routinely download and evaluate double-digit megabytes of JavaScript code on mobile browsers, so I'm not sure I understand your bottleneck concern. If the code is known ahead of time then I would recommend a snapshot. If it supplied by the user then I would recommend compiling and linking the whole graph upfront.

laverdet avatar Feb 11 '25 22:02 laverdet

Is this code authored by you or another developer that you don't trust?

The app host can whitelist packages that extend the app's functionality, so that's grey. The untrusted user supplies modules as well, but they are limited to interfacing with a subset of the library.

Closing the door on dynamic loading cuts off potential security holes, but requires every isolate to load the entire library. Obviously security outweighs scalability, but I'd like to have both.

If the code is known ahead of time then I would recommend a snapshot.

The user supplies modules. It is therefore known before starting the isolate, but not before compiling the snapshot. My guess is this counts as not known ahead of time, please correct me otherwise.

With regard to the snapshots feature I wonder whether it is compatible with modules.

Alternatively, it seems I could reuse isolates by switching out the context for each user submission, thereby reusing the compiled modules. This exchange suggests contexts cannot access each other:

  • What safeties are in place that prevent Javascript from breaking out of a Context?

Context::SetSecurityToken() - contexts with different tokens can't access each other's objects; that includes arrays and functions.

  • What safeties are in place that prevent Javascript from breaking out of an Isolate?

The observation that the V8 team would panic if that was possible. :-)

It would be a pretty serious security vulnerability and Google takes those seriously. Report one or two good ones through the bug bounty program and you could take the rest of the year off.

pieterjanv avatar Feb 12 '25 08:02 pieterjanv

@pieterjanv do you have a complete example? Can you make a https://gist.github.com from it? You can add multiple files - it's unix case sensitive and sorted so if you add a README.md it will get ordered first... I haven't run into non resolved promises; I might suggest adding .catches to each of the steps (or at least a try-catch around the whole thing) maybe it doesn't actually get to the end? I also don't know how that is called from the VM; in mine, I result with a promise,

Re that last bit - 'a context with a different security token' is also probably not in the same isolate (?). I'm not sure how long it would take to build full isolates with most/all general functionality already - it might also be possible to queue up some in the future to be done? Once the work(user module) is dispatched, the main thread is free to start building a new isolate. but then I'd be interested to know how much time it takes to revive/restore a snapshot into a runnable state; surely that's not entirely free.

I added a couple files to the gist, that are updated version of existing files...

This has a blob of code at the end that registers globalThis.import and globalThis.requiredCode. globalThis.import calls a function outside the vm

global.setSync('import', ivmImport );
function ivmImport( specifier, id ) {
	console.log( "ivmImport:", specifier );
	ivmRunFile( specifier, true, specifier, true ).then( (module)=>{
		context.evalClosureSync( 'globalThis.requiredCode($0,$1)', [id,module] );
		return module;
	} );
}

and import and 'requiredCode' callback which resolves the in-isolate promise for the import...

https://gist.github.com/d3x0r/69dcb7551175899aadbba67f06776ab5#file-sandboxinit2-dynamic-ivm-mjs-L207-L226

and the ab.mjs has a ab-dynamic.mjs that imports the command.js extension with import.

The module gets back; but it doesn't look like the module i'd expect... i suppose it should really be the Module.namespace that gets resulted... But then I have to use .get on the namespace reference... (not sure why because I'm pretty sure this should all be living in the current context/isolate).

const d = await command.get("default");
const d2 = await command.get("command");
console.log( "Command dynamic:", d, d2, command.default, command.command );

and import("./whatever.js") does just fail with no support; globalThis.import( ... ) almost works...

Would be nice if you passed null or undefined to .get() on a reference maybe it could return the array of keys it can return?

d3x0r avatar Feb 13 '25 02:02 d3x0r

https://v8docs.nodesource.com/node-13.2/d5/dda/classv8_1_1_isolate.html#a2b1e8fe982f0b2a8c61d9413cf94c87c I guess this?
https://v8docs.nodesource.com/node-22.4/d2/dc3/namespacev8.html#a920ca515dad3aaa22f86af87f2dcd0e5 (newer)

SetHostImportModuleDynamicallyCallback

Which can take a couple different callback signatures (one with import assertions, another without; when I was trying to import "config.json" earlier in node, I was getting 'must have an assertion for JSON type specified' (something like that).

shrug maybe one is internal....

(from v8 node 22 source)

void Isolate::SetHostImportModuleDynamicallyCallback(
    HostImportModuleDynamicallyCallback callback) {
  DCHECK_NULL(host_import_module_dynamically_with_import_assertions_callback_);
  host_import_module_dynamically_callback_ = callback;
}

void Isolate::SetHostImportModuleDynamicallyCallback(
    HostImportModuleDynamicallyWithImportAssertionsCallback callback) {
  DCHECK_NULL(host_import_module_dynamically_callback_);
  host_import_module_dynamically_with_import_assertions_callback_ = callback;
}

d3x0r avatar Feb 13 '25 07:02 d3x0r

@d3x0r I don't want to hijack this issue to engage in in-depth discussion. Maybe you can open a discussion on one of your repo's, and I can comment there.

pieterjanv avatar Feb 13 '25 07:02 pieterjanv

Go ahead... Edit: I didn't even do a test for recursive imports :)

d3x0r avatar Feb 13 '25 07:02 d3x0r

https://github.com/d3x0r/isolated-vm/commit/c68f45750493b7a0dd2ee087eafd11c707862eed#diff-9f150315f76bb753d07920312ac71a4593da5434d4757fbb4eea5a00abd37ec6R174-R183

Rough implementation of SetHostImportModuleDynamicallyCallback but I'm far from a C++ guru; and I can't seem to figure out how to dispatch the event to the uv_loop to be handled by the main JS thread... (which is what the selected lines in the above link are around)

d3x0r avatar Feb 13 '25 15:02 d3x0r

https://github.com/d3x0r/isolated-vm/pull/2 This is related to dynamic import more than the linking error the issue is titled with...

it very nearly works; but is far from elegant; and surely can be simplified by better knowledge of isolated-vm classes... In the description I tried a couple ways to explain the execution path...

I'm just having problems getting the namespace properly to resolve the final promise with; it's completing with 'undefined' at the moment; and believe you me, I'm aware of how gross my implementation is.

d3x0r avatar Feb 16 '25 03:02 d3x0r

@d3x0r I ran into this issue when attempting to async pre-load a number of modules (some with circular references); I would randomly run into this error, 90% of the time the modules would load correctly. If I remove the exception the modules would all consistently load correctly (I tested each module to make sure they function correctly). Seems like there's a race-condition with the link status.

https://github.com/garymathews/isolated-vm/commit/df4a18b642e96bfbed5136a5ef1b79183f9de85e

I'm curious if your use-case works with this exception removed. Obviously this is not a proper "fix", but could expose this as a race-condition.

garymathews avatar Mar 08 '25 03:03 garymathews

@garymathews really either way - it's already linking, and nothing needs to be done. so just changing the break to return is what I currently have.... and that handles recursive cases where the linker === this.

Edit: Mine is just a log to stdout....

d3x0r avatar Mar 08 '25 23:03 d3x0r

Should this exception be removed then, since recursive linking is a valid case.

garymathews avatar Mar 10 '25 02:03 garymathews

@garymathews yes

d3x0r avatar Mar 10 '25 21:03 d3x0r