ts-node icon indicating copy to clipboard operation
ts-node copied to clipboard

fix: Add globalPreload to ts-node/esm for node 20

Open isaacs opened this issue 1 year ago • 37 comments

As of node v20, loader hooks are executed in a separate isolated thread environment. As a result, they are unable to register the require.extensions hooks in a way that would (in other node versions) make both CJS and ESM work as expected.

By adding a globalPreload method, which does execute in the main script environment (but with very limited capabilities), these hooks can be attached properly, and --loader=ts-node/esm will once again make both cjs and esm typescript programs work properly.

isaacs avatar May 06 '23 04:05 isaacs

Not sure what kind of test should be added for this, since it just returns a string. Just comparing against a fixture feels kind of redundant?

isaacs avatar May 06 '23 04:05 isaacs

Hm, this definitely needs a bit more work, because it breaks on node versions that don't have the off-thread loader hooks. (Ie, everything before v20.)

I'm not sure the best idiomatic approach to that in this project, but I'm not sure how to detect the situation where it's required, other than sniffing the version.

isaacs avatar May 06 '23 05:05 isaacs

If it helps, in node 19 vs node 20, this file:

export function globalPreload() {
	console.log('globalPreload');
	return '';
}

export function getGlobalPreloadCode() {
	console.log('getGlobalPreloadCode');
	return '';
}

logs, in node 19:

(node:43737) ExperimentalWarning: Custom ESM Loaders is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
globalPreload
(node:43737) Warning: Loader hook "getGlobalPreloadCode" has been renamed to "globalPreload"
(node:43737) Warning: Loader hook "getGlobalPreloadCode" has been renamed to "globalPreload"

and in node 20:

(node:43215) ExperimentalWarning: Custom ESM Loaders is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
globalPreload

… so, you could probably intercept console.log at module level, catch the experimental warning (and suppress it), and use it to indicate whether you were in 20+ or not?

ljharb avatar May 06 '23 05:05 ljharb

@ljharb ooh sneaky, that might work. I'll look into that.

isaacs avatar May 06 '23 05:05 isaacs

Would this allow ts-node to be used directly again in Node 20, or would it still require it to be run through node --loader?

Eagerly awaiting the fix on this so we can support development on Node 20 🙏

ssalbdivad avatar May 06 '23 16:05 ssalbdivad

Historically I use version sniffing for this stuff, so I'd go with that. It's happened several times before that ts-node needs to implement multiple behaviors depending on the version of node or ts.

Something to keep in mind: ts-node --esm launches a subprocess sorta like node --require <foo> --loader <bar> <args>. So this change will need to be compatible with both ts-node --esm and node --loader ts-node/esm.

In typechecking mode, is the typechecking work being repeated on both threads? Keeping in mind that typechecking one file involves parsing the others, so CJS files typecheck with type info from ESM files and vice-versa. Repeated typechecking work is not a dealbreaker, but it's something we should at least document in this thread.

cspotcode avatar May 06 '23 16:05 cspotcode

Codecov Report

Merging #2009 (3fd7b4f) into main (47d4f45) will increase coverage by 0.25%. The diff coverage is 62.50%.

:exclamation: Current head 3fd7b4f differs from pull request most recent head b614b1b. Consider uploading reports for the commit b614b1b to get more accurate results

Files Changed Coverage Δ
src/transpilers/swc.ts 81.81% <ø> (ø)
src/child/child-loader.ts 54.54% <38.46%> (-23.24%) :arrow_down:
src/esm.ts 78.90% <62.06%> (-3.67%) :arrow_down:
src/bin.ts 89.83% <66.66%> (-0.52%) :arrow_down:
src/index.ts 80.58% <87.50%> (+0.48%) :arrow_up:
src/child/spawn-child.ts 84.21% <100.00%> (-3.29%) :arrow_down:

... and 3 files with indirect coverage changes

:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar May 06 '23 16:05 codecov[bot]

Added a commit to ~~only~~ not add the globalPreload registration on node versions less than 20.0.0.

@cspotcode As far as I can tell, with this change, ts-node --esm file.ts is just as broken on node 20 (but not any more broken, at least). I'll look into that, might be a straightforward way to work around it.

I haven't looked into type checking, but my guess is, if registerAndCreateEsmHooks triggers the type checking, then yes, it'd happen twice. Though, if it aborts on failure, it'd only be done once in the failure case, because the loader runs to completion before the globalPreload is executed. It is definitely registering extensions twice, but in isolated environments.

isaacs avatar May 07 '23 04:05 isaacs

Got some time to dig into this. The issue with ts-node --esm blah.mts seems to have something to do with lateBindHooks. In node before v20, this works fine, but in v20, it doesn't pick them up. Still tracing through to try to figure out why that is.

isaacs avatar May 23 '23 05:05 isaacs

Aha, of course.

The callInChild is running node --loader=child-loader.js child-entrypoint.js <args>.

child-loader.js sets up the proxy hooks, and child-entrypoint.js assigns the values to them by calling bootstrap() from ../bin, which calls lateBindHooks. But child-entrypoint.js and child-loader.js are in separate isolated threads.

The only way this can work on node 20 is for child-loader.js to set the actual hooks itself, rather than late binding them in the main thread. The child-entrypoint.js should register the require.extensions handlers, however, that avoids the need for a globalPreload.

isaacs avatar May 23 '23 06:05 isaacs

Ok, got ts-node --esm foo.mts working, albeit in a somewhat unfortunate copypasta way. I suggest refactoring to remove the late-binding loaders entirely on all Node versions, and just set them up in the child-loader.mjs only. It seems like child-entrypoint.ts would then only be needed to munge the process.argv which could be done in a globalPreload on node v20 and higher (until that's replaced with main-thread assignment of loaders via --import), or directly in the loader on earlier versions. I held off on doing that for now, on the assumption that there might be other side-effects I'm not aware of.

Verified that typechecking does not get run twice in the presence of an off-thread loader, which in hindsight makes sense, since either the source is being loaded and transpiled once in the loader thread, or once in the main (only) thread, but never both. The whole point is that the load() function is never called in the main thread on node v20, which is what will eventually enable synchronous-looking behavior of loaders, in accordance with the browser specs.

@cspotcode PTAL when you get a chance :)

isaacs avatar May 23 '23 06:05 isaacs

Thanks, I haven't had a chance to take a look yet, but WRT the double-typechecking: I imagine it would happen when mixing CJS and ESM. E.g. you have some .mts files and some .cts files. And since typechecking one file relies on type info from others, the compiler repeats a bunch of work on both threads.

cspotcode avatar May 23 '23 15:05 cspotcode

I've been trying to throw some complicated scenarios at it, but I'm not sure how to go about triggering the situation you're thinking of here.

The userland program isn't ever loaded or typechecked by TS in the loader thread. So if there's double-typechecking happening, it seems to me that it'd be either (a) the loading of ts files from within ts-node itself (or its deps), or (b) already a problem without a loader-thread (ie, if double-checking is happening in a single-threaded loader environment).

Maybe there's something I'm missing? But it seems like this can't possibly make the problem significantly worse. If you have a test or example I can poke at, I'm happy to try to dig in further.

isaacs avatar May 25 '23 17:05 isaacs

Er, rather, it's not executed on the loader thread. Obviously the code is loaded in the loader thread.

But the typechecking happens when it compiles it to JS, and that only happens in one place. What actually ends up in the main thread is JavaScript going straight to the node VM.

isaacs avatar May 25 '23 17:05 isaacs

If double-typecheck is happening, the only visible side-effect would be higher CPU usage. So it's not a problem, per-se, it just means potentially a performance regression on node 20.

If code on the main thread does require('./other-file') then that TS->JS transformation and typecheck happens on the main thread, right?

The scenario I imagine is:

ts-node --esm ./entrypoint.mts

Where ./entrypoint.mts has require('./other.cts')

The loader thread's load() hooks ask the TS compiler for TS diagnostics on entrypoint.mts. This means an instance of the TS compiler inside the loader thread does the work of parsing and computing type information for entrypoint.mts, other.cts, and every other file they (transitively) reference.

Then the main thread does require('./other.cts'), which asks the TS compiler for TS diagnostics on other.cts This means another instance of the TS compiler on the main thread does the work of parsing and computing type information for other.cts and every other file it (transitively) references.

cspotcode avatar May 25 '23 21:05 cspotcode

The scenario above could also happen if entrypoint.mts does import './other.cts' which does import './another.cts' (secretly compiles to require('./another.cts') So could happen in a mixed CTS / MTS codebase where source code is exclusively import but some get compiled to require()

cspotcode avatar May 25 '23 21:05 cspotcode

Thinking through the double-checking scenario, I think you're right, but I don't think there's much to be done about it. The good news is, once source-returning commonjs loaders land in node, and ts-node starts using that instead, then the problem goes away (along with several others!), since require() will also be going through the same paths on the loader thread.

Cleaned up where the globalPreload logic lives, and a few other things.

isaacs avatar May 29 '23 18:05 isaacs

The ideal until node supports CJS-via-loaders is that we use the message port to delegate all compilation into the loader thread. service.compile() in the main thread and all worker threads can be a shim that makes a blocking RPC call into the loader thread.

But that requires more work and it's not necessary to get this merged.

cspotcode avatar May 30 '23 02:05 cspotcode

The ideal until node supports CJS-via-loaders is that we use the message port to delegate all compilation into the loader thread. service.compile() in the main thread and all worker threads can be a shim that makes a blocking RPC call into the loader thread.

I was actually going to suggest something similar, having been in this code a little bit now. Using the globalPreload context.port is a bit tricky, since it means putting more logic in the sloppy mode string-literal code, and you have to feature- or version-detect to know whether the port is even going to be there. The approach I'm using with @tapjs/processinfo is to use diagnostics_channel, which has been available for quite a bit longer: https://github.com/tapjs/processinfo/blob/main/lib/esm.mts

Then at least on node versions from 14.17 up, you could use the same approach for all loaders, whether they're running in a separate thread or not. It does add a tiny bit of unnecessary serialization overhead if loaders are already on the main thread and can just call the function directly, but not much. If the intermediate child process spawned by ts-node --esm had an IPC channel, you could probably have it work in the same way.

isaacs avatar May 30 '23 03:05 isaacs

Does diagnostics_channel support blocking calls across thread boundaries? So that require('./something.ts') can be compiled off-thread?

since it means putting more logic in the sloppy mode string-literal code

I'm not too worried about that. We should use the same approach as with the .js/.mjs shims, where all logic lives in separate .ts files. require('/abs/path/to/something-else.js').doTheWork(typeof port == 'undefined' ? port : undefined)

you have to feature- or version-detect to know whether the port is even going to be there

When will the port be absent? My understanding is that it's always present w/off-thread loaders. Are there any non-EOLed node versions that don't have it?

you could use the same approach for all loaders, whether they're running in a separate thread or not

This seems not a big deal. Main and worker threads are making a blocking call to .compile(), whether that call is handled on-thread or RPCd over thread boundaries can be transparently swapped out.

cspotcode avatar May 31 '23 15:05 cspotcode

When will the port be absent? My understanding is that it's always present w/off-thread loaders. Are there any non-EOLed node versions that don't have it?

It is always present with off-thread loaders.

But it seems I was completely mistaken about this, and while diagnostics_channel is synchronous, it doesn't (as yet) cross over to the loader thread, so you do still need to proxy through the globalPreload context.port, at least until we get import { register } from 'node:module', and that is not currently synchronous. So, at least for the near term, require.extensions and the possibility of double-typechecking is unavoidable.

isaacs avatar May 31 '23 18:05 isaacs

When will the port be absent?

Technically speaking, it's not present on 16.0 through 16.11, which doesn't EOL until September 2023, but in those cases we don't bother to use a globalPreload, so it's not an issue.

isaacs avatar May 31 '23 18:05 isaacs

import { register } from 'node:module', and that is not currently synchronous.

It is synchronous: https://github.com/nodejs/node/pull/46826

GeoffreyBooth avatar May 31 '23 21:05 GeoffreyBooth

import { register } from 'node:module', and that is not currently synchronous.

It is synchronous: https://github.com/nodejs/node/pull/46826

He is right, register fn is not totally synchronous. Behind the scenes it's still async because of the communication with loaders worker thread.

jlenon7 avatar May 31 '23 22:05 jlenon7

For the sake of anyone else reading along:

A combination of MessageChannel & SharedArrayBuffer & Atomics can be used to make blocking RPC calls between threads.

The goal is to make blocking calls from the main thread to the loader thread, using MessageChannel / SharedArrayBuffer / Atomics. We want to block the main thread / worker thread, but the loader thread can answer asynchronously.

So it is possible today for require() hooks to call another thread for module resolution and compilation.


Sounds like the question is whether bootstrapping must be async. Do we have to await register() before Module.runMain();? Does register() give us a promise which is guaranteed to resolve after a port exists in the main thread which can talk to the loader thread? As soon as the promise resolves, we'll synchronously attempt to use the port.

cspotcode avatar May 31 '23 23:05 cspotcode

Do we have to await register() before Module.runMain();? Does register() give us a promise

At least according to the docs in the PR, register is sync. It does not return a promise.

GeoffreyBooth avatar Jun 01 '23 00:06 GeoffreyBooth

So it is possible today for require() hooks to call another thread for module resolution and compilation.

This would be useful to abstract into its own library for other loaders to use.

GeoffreyBooth avatar Jun 01 '23 00:06 GeoffreyBooth

What guarantees does register() make about import calls passing to the newly-registered loader?

register(a); await import(b) which loaders handle the import of b?

cspotcode avatar Jun 01 '23 17:06 cspotcode

register(a); await import(b) which loaders handle the import of b?

a should handle the import of b.

GeoffreyBooth avatar Jun 01 '23 19:06 GeoffreyBooth

FYI there are further changes in the works. The import { register } from 'module' mentioned above has landed on main in Node, though we’re holding it from being released until https://github.com/nodejs/node/pull/48842 also lands, which adds the ability for register to set up the communications channels between the main and loader threads. This will eliminate the need for globalPreload, which will be removed sometime thereafter.

The way forward looks something like this. Instead of --loader ts-node/esm, you would advise your users to run something like --import ts-node/register. That new file would include code that both calls register to register the loader hooks and it would also set up the cross-thread communications, a lot of what this PR does. It could also set up require.extensions or whatever other main thread stuff ts-node wants done, without needing to spawn child processes. Hopefully this should fix ts-node for Node 20 in a cleaner way than what’s proposed in this PR.

After that, the next PR on our list is https://github.com/nodejs/node/pull/47999, which would eliminate the need for using require.extensions to handle CommonJS code; this should hopefully simplify ts-node further, as you wouldn’t need separate code paths for CommonJS and ESM.

GeoffreyBooth avatar Jul 20 '23 19:07 GeoffreyBooth