ts-node
ts-node copied to clipboard
fix: Add globalPreload to ts-node/esm for node 20
As of node v20, loader hooks are executed in a separate isolated thread environment. As a result, they are unable to register the require.extensions
hooks in a way that would (in other node versions) make both CJS and ESM work as expected.
By adding a globalPreload
method, which does execute in the main script environment (but with very limited capabilities), these hooks can be attached properly, and --loader=ts-node/esm
will once again make both cjs and esm typescript programs work properly.
Not sure what kind of test should be added for this, since it just returns a string. Just comparing against a fixture feels kind of redundant?
Hm, this definitely needs a bit more work, because it breaks on node versions that don't have the off-thread loader hooks. (Ie, everything before v20.)
I'm not sure the best idiomatic approach to that in this project, but I'm not sure how to detect the situation where it's required, other than sniffing the version.
If it helps, in node 19 vs node 20, this file:
export function globalPreload() {
console.log('globalPreload');
return '';
}
export function getGlobalPreloadCode() {
console.log('getGlobalPreloadCode');
return '';
}
logs, in node 19:
(node:43737) ExperimentalWarning: Custom ESM Loaders is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
globalPreload
(node:43737) Warning: Loader hook "getGlobalPreloadCode" has been renamed to "globalPreload"
(node:43737) Warning: Loader hook "getGlobalPreloadCode" has been renamed to "globalPreload"
and in node 20:
(node:43215) ExperimentalWarning: Custom ESM Loaders is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
globalPreload
… so, you could probably intercept console.log at module level, catch the experimental warning (and suppress it), and use it to indicate whether you were in 20+ or not?
@ljharb ooh sneaky, that might work. I'll look into that.
Would this allow ts-node
to be used directly again in Node 20, or would it still require it to be run through node --loader
?
Eagerly awaiting the fix on this so we can support development on Node 20 🙏
Historically I use version sniffing for this stuff, so I'd go with that. It's happened several times before that ts-node needs to implement multiple behaviors depending on the version of node or ts.
Something to keep in mind: ts-node --esm
launches a subprocess sorta like node --require <foo> --loader <bar> <args>
. So this change will need to be compatible with both ts-node --esm
and node --loader ts-node/esm
.
In typechecking mode, is the typechecking work being repeated on both threads? Keeping in mind that typechecking one file involves parsing the others, so CJS files typecheck with type info from ESM files and vice-versa. Repeated typechecking work is not a dealbreaker, but it's something we should at least document in this thread.
Codecov Report
Merging #2009 (3fd7b4f) into main (47d4f45) will increase coverage by
0.25%
. The diff coverage is62.50%
.
:exclamation: Current head 3fd7b4f differs from pull request most recent head b614b1b. Consider uploading reports for the commit b614b1b to get more accurate results
Files Changed | Coverage Δ | |
---|---|---|
src/transpilers/swc.ts | 81.81% <ø> (ø) |
|
src/child/child-loader.ts | 54.54% <38.46%> (-23.24%) |
:arrow_down: |
src/esm.ts | 78.90% <62.06%> (-3.67%) |
:arrow_down: |
src/bin.ts | 89.83% <66.66%> (-0.52%) |
:arrow_down: |
src/index.ts | 80.58% <87.50%> (+0.48%) |
:arrow_up: |
src/child/spawn-child.ts | 84.21% <100.00%> (-3.29%) |
:arrow_down: |
... and 3 files with indirect coverage changes
:loudspeaker: Have feedback on the report? Share it here.
Added a commit to ~~only~~ not add the globalPreload registration on node versions less than 20.0.0.
@cspotcode As far as I can tell, with this change, ts-node --esm file.ts
is just as broken on node 20 (but not any more broken, at least). I'll look into that, might be a straightforward way to work around it.
I haven't looked into type checking, but my guess is, if registerAndCreateEsmHooks
triggers the type checking, then yes, it'd happen twice. Though, if it aborts on failure, it'd only be done once in the failure case, because the loader runs to completion before the globalPreload is executed. It is definitely registering extensions twice, but in isolated environments.
Got some time to dig into this. The issue with ts-node --esm blah.mts
seems to have something to do with lateBindHooks
. In node before v20, this works fine, but in v20, it doesn't pick them up. Still tracing through to try to figure out why that is.
Aha, of course.
The callInChild
is running node --loader=child-loader.js child-entrypoint.js <args>
.
child-loader.js
sets up the proxy hooks, and child-entrypoint.js
assigns the values to them by calling bootstrap()
from ../bin
, which calls lateBindHooks
. But child-entrypoint.js
and child-loader.js
are in separate isolated threads.
The only way this can work on node 20 is for child-loader.js
to set the actual hooks itself, rather than late binding them in the main thread. The child-entrypoint.js
should register the require.extensions
handlers, however, that avoids the need for a globalPreload
.
Ok, got ts-node --esm foo.mts
working, albeit in a somewhat unfortunate copypasta way. I suggest refactoring to remove the late-binding loaders entirely on all Node versions, and just set them up in the child-loader.mjs only. It seems like child-entrypoint.ts would then only be needed to munge the process.argv
which could be done in a globalPreload on node v20 and higher (until that's replaced with main-thread assignment of loaders via --import
), or directly in the loader on earlier versions. I held off on doing that for now, on the assumption that there might be other side-effects I'm not aware of.
Verified that typechecking does not get run twice in the presence of an off-thread loader, which in hindsight makes sense, since either the source is being loaded and transpiled once in the loader thread, or once in the main (only) thread, but never both. The whole point is that the load()
function is never called in the main thread on node v20, which is what will eventually enable synchronous-looking behavior of loaders, in accordance with the browser specs.
@cspotcode PTAL when you get a chance :)
Thanks, I haven't had a chance to take a look yet, but WRT the double-typechecking:
I imagine it would happen when mixing CJS and ESM. E.g. you have some .mts
files and some .cts
files. And since typechecking one file relies on type info from others, the compiler repeats a bunch of work on both threads.
I've been trying to throw some complicated scenarios at it, but I'm not sure how to go about triggering the situation you're thinking of here.
The userland program isn't ever loaded or typechecked by TS in the loader thread. So if there's double-typechecking happening, it seems to me that it'd be either (a) the loading of ts files from within ts-node itself (or its deps), or (b) already a problem without a loader-thread (ie, if double-checking is happening in a single-threaded loader environment).
Maybe there's something I'm missing? But it seems like this can't possibly make the problem significantly worse. If you have a test or example I can poke at, I'm happy to try to dig in further.
Er, rather, it's not executed on the loader thread. Obviously the code is loaded in the loader thread.
But the typechecking happens when it compiles it to JS, and that only happens in one place. What actually ends up in the main thread is JavaScript going straight to the node VM.
If double-typecheck is happening, the only visible side-effect would be higher CPU usage. So it's not a problem, per-se, it just means potentially a performance regression on node 20.
If code on the main thread does require('./other-file')
then that TS->JS transformation and typecheck happens on the main thread, right?
The scenario I imagine is:
ts-node --esm ./entrypoint.mts
Where ./entrypoint.mts
has require('./other.cts')
The loader thread's load()
hooks ask the TS compiler for TS diagnostics on entrypoint.mts
.
This means an instance of the TS compiler inside the loader thread does the work of parsing and computing type information for entrypoint.mts
, other.cts
, and every other file they (transitively) reference.
Then the main thread does require('./other.cts')
, which asks the TS compiler for TS diagnostics on other.cts
This means another instance of the TS compiler on the main thread does the work of parsing and computing type information for other.cts
and every other file it (transitively) references.
The scenario above could also happen if entrypoint.mts
does import './other.cts'
which does import './another.cts'
(secretly compiles to require('./another.cts')
So could happen in a mixed CTS / MTS codebase where source code is exclusively import
but some get compiled to require()
Thinking through the double-checking scenario, I think you're right, but I don't think there's much to be done about it. The good news is, once source-returning commonjs loaders land in node, and ts-node starts using that instead, then the problem goes away (along with several others!), since require()
will also be going through the same paths on the loader thread.
Cleaned up where the globalPreload
logic lives, and a few other things.
The ideal until node supports CJS-via-loaders is that we use the message port to delegate all compilation into the loader thread. service.compile()
in the main thread and all worker threads can be a shim that makes a blocking RPC call into the loader thread.
But that requires more work and it's not necessary to get this merged.
The ideal until node supports CJS-via-loaders is that we use the message port to delegate all compilation into the loader thread. service.compile() in the main thread and all worker threads can be a shim that makes a blocking RPC call into the loader thread.
I was actually going to suggest something similar, having been in this code a little bit now. Using the globalPreload
context.port is a bit tricky, since it means putting more logic in the sloppy mode string-literal code, and you have to feature- or version-detect to know whether the port is even going to be there. The approach I'm using with @tapjs/processinfo is to use diagnostics_channel, which has been available for quite a bit longer: https://github.com/tapjs/processinfo/blob/main/lib/esm.mts
Then at least on node versions from 14.17 up, you could use the same approach for all loaders, whether they're running in a separate thread or not. It does add a tiny bit of unnecessary serialization overhead if loaders are already on the main thread and can just call the function directly, but not much. If the intermediate child process spawned by ts-node --esm
had an IPC channel, you could probably have it work in the same way.
Does diagnostics_channel
support blocking calls across thread boundaries? So that require('./something.ts')
can be compiled off-thread?
since it means putting more logic in the sloppy mode string-literal code
I'm not too worried about that. We should use the same approach as with the .js
/.mjs
shims, where all logic lives in separate .ts
files. require('/abs/path/to/something-else.js').doTheWork(typeof port == 'undefined' ? port : undefined)
you have to feature- or version-detect to know whether the port is even going to be there
When will the port be absent? My understanding is that it's always present w/off-thread loaders. Are there any non-EOLed node versions that don't have it?
you could use the same approach for all loaders, whether they're running in a separate thread or not
This seems not a big deal. Main and worker threads are making a blocking call to .compile()
, whether that call is handled on-thread or RPCd over thread boundaries can be transparently swapped out.
When will the port be absent? My understanding is that it's always present w/off-thread loaders. Are there any non-EOLed node versions that don't have it?
It is always present with off-thread loaders.
But it seems I was completely mistaken about this, and while diagnostics_channel is synchronous, it doesn't (as yet) cross over to the loader thread, so you do still need to proxy through the globalPreload context.port, at least until we get import { register } from 'node:module'
, and that is not currently synchronous. So, at least for the near term, require.extensions and the possibility of double-typechecking is unavoidable.
When will the port be absent?
Technically speaking, it's not present on 16.0 through 16.11, which doesn't EOL until September 2023, but in those cases we don't bother to use a globalPreload, so it's not an issue.
import { register } from 'node:module'
, and that is not currently synchronous.
It is synchronous: https://github.com/nodejs/node/pull/46826
import { register } from 'node:module'
, and that is not currently synchronous.It is synchronous: https://github.com/nodejs/node/pull/46826
He is right, register
fn is not totally synchronous. Behind the scenes it's still async because of the communication with loaders worker thread.
For the sake of anyone else reading along:
A combination of MessageChannel
& SharedArrayBuffer
& Atomics
can be used to make blocking RPC calls between threads.
The goal is to make blocking calls from the main thread to the loader thread, using MessageChannel
/ SharedArrayBuffer
/ Atomics
. We want to block the main thread / worker thread, but the loader thread can answer asynchronously.
So it is possible today for require()
hooks to call another thread for module resolution and compilation.
Sounds like the question is whether bootstrapping must be async. Do we have to await register()
before Module.runMain();
? Does register()
give us a promise which is guaranteed to resolve after a port
exists in the main thread which can talk to the loader thread? As soon as the promise resolves, we'll synchronously attempt to use the port.
Do we have to await register() before Module.runMain();? Does register() give us a promise
At least according to the docs in the PR, register
is sync. It does not return a promise.
So it is possible today for require() hooks to call another thread for module resolution and compilation.
This would be useful to abstract into its own library for other loaders to use.
What guarantees does register()
make about import calls passing to the newly-registered loader?
register(a); await import(b)
which loaders handle the import of b?
register(a); await import(b) which loaders handle the import of b?
a
should handle the import of b
.
FYI there are further changes in the works. The import { register } from 'module'
mentioned above has landed on main
in Node, though we’re holding it from being released until https://github.com/nodejs/node/pull/48842 also lands, which adds the ability for register
to set up the communications channels between the main and loader threads. This will eliminate the need for globalPreload
, which will be removed sometime thereafter.
The way forward looks something like this. Instead of --loader ts-node/esm
, you would advise your users to run something like --import ts-node/register
. That new file would include code that both calls register
to register the loader hooks and it would also set up the cross-thread communications, a lot of what this PR does. It could also set up require.extensions
or whatever other main thread stuff ts-node
wants done, without needing to spawn child processes. Hopefully this should fix ts-node
for Node 20 in a cleaner way than what’s proposed in this PR.
After that, the next PR on our list is https://github.com/nodejs/node/pull/47999, which would eliminate the need for using require.extensions
to handle CommonJS code; this should hopefully simplify ts-node
further, as you wouldn’t need separate code paths for CommonJS and ESM.