sentry-javascript icon indicating copy to clipboard operation
sentry-javascript copied to clipboard

import hook incompatible with `tsx`

Open nwalters512 opened this issue 1 year ago • 24 comments

Is there an existing issue for this?

  • [X] I have checked for existing issues https://github.com/getsentry/sentry-javascript/issues
  • [X] I have reviewed the documentation https://docs.sentry.io/
  • [X] I am using the latest SDK release https://github.com/getsentry/sentry-javascript/releases

How do you use Sentry?

Sentry Saas (sentry.io)

Which SDK are you using?

@sentry/node

SDK Version

8.0.0

Framework Version

No response

Link to Sentry event

No response

SDK Setup

Sentry.init({
  dsn: 'https://[email protected]/0',
});

Steps to Reproduce

I've created a minimal reproduction here: https://github.com/nwalters512/sentry-v8-tsx-error-repro

  1. Clone the repository
  2. Install dependencies with yarn
  3. Run yarn tsx src/index.ts
  4. Observe that the process fails with the error TypeError [ERR_INVALID_URL_SCHEME]: The URL must be of scheme file

Several observations that are hopefully helpful to y'all:

  • This only occurs when "allowJs": true is set in tsconfig.json. As Sentry isn't concerned with this, that made me think this problem might actually be in tsx. However...

  • When adding instrumentation directly with the @opentelemetry/* packages, everything works fine. This is reproducible by making the following change to src/index.ts:

    -import './instrument-sentry.js';
    +import './instrument-opentelemetry.js';
    

    Given this, I'm strongly inclined to believe that this is an issue with the way in which Sentry is using OpenTelemetry.

  • This only breaks for core Node modules like util, fs, etc. Importing other modules works fine. For instance, the following change to src/index.ts makes it work without erroring:

    -await import('util');
    +await import('zod');
    
  • This only breaks for dynamic imports. For instance, the following change to src/index.ts makes it work without erroring:

    -await import('util');
    +import 'util';
    

Expected Result

I would expect the process to complete successfully and log util imported!.

Actual Result

The process errors out while importing util and does not print util imported!.

nwalters512 avatar May 13 '24 23:05 nwalters512

The full stack trace is:

TypeError [ERR_INVALID_URL_SCHEME]: The URL must be of scheme file
    at new NodeError (node:internal/errors:406:5)
    at fileURLToPath (node:internal/url:1393:11)
    at finalizeResolution (node:internal/modules/esm/resolve:234:42)
    at moduleResolve (node:internal/modules/esm/resolve:845:10)
    at defaultResolve (node:internal/modules/esm/resolve:1043:11)
    at nextResolve (node:internal/modules/esm/hooks:833:28)
    at y (file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/node_modules/tsx/dist/esm/index.mjs?1715684968701:2:2079)
    at j (file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/node_modules/tsx/dist/esm/index.mjs?1715684968701:2:3198)
    at nextResolve (node:internal/modules/esm/hooks:833:28)
    at resolve (/Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/node_modules/@opentelemetry/instrumentation/node_modules/import-in-the-middle/hook.js:238:23) {
  code: 'ERR_INVALID_URL_SCHEME'
}

And here is the bottom of the call stack.

Maybe the loader hook added by import-in-the-middle is somehow clashing with the loader hooks added by tsx?

timfish avatar May 14 '24 11:05 timfish

Looking at the stack trace, although import-in-the-middle is at the bottom, further up we see frames from tsx:

    at y (file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/node_modules/tsx/dist/esm/index.mjs?1715684968701:2:2079)
    at j (file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/node_modules/tsx/dist/esm/index.mjs?1715684968701:2:3198)

I guess it's not handling the params the iitm adds to the url?

So I've opened an issue there: https://github.com/privatenumber/tsx/issues/554

timfish avatar May 14 '24 14:05 timfish

@timfish thanks for investigating this; your small reproduction that used just import-in-the-middle was very helpful. I'm relatively confident that the issue here is in fact with import-in-the-middle and the fact that it appends query parameters to the URL it returns from its resolvers. That is, consider this line:

https://github.com/DataDog/import-in-the-middle/blob/00b01fff1f5b69dd25e307593ec54d1d8abb4844/hook.js#L259

Changing that to just url: url.url makes your reproduction complete without error (though I have no idea what effect that has on import-in-the-middle actually working).

Specifically, it looks like the issue is that this URL with ?iitm=true ends up as the parentURL in a subsequent resolution, which you can see by adding console.log(specifier, context) immediately after https://github.com/privatenumber/tsx/blob/e2afc60bbcc299e09a5bf0e0c8909b6766b633a2/src/esm/hook/resolve.ts#L39:

file:///Users/nathan/git/tsx/node_modules/.pnpm/[email protected]/node_modules/import-in-the-middle/lib/register.ts {
  conditions: [ 'node', 'import', 'node-addons' ],
  importAttributes: {},
  parentURL: 'node:util?iitm=true'
}

nwalters512 avatar May 14 '24 16:05 nwalters512

Tracing the resolving of the otel hooking, it looks like ?iitm=true is used so that iitm knows that the resolving is actually coming from its hook rather than the original location.

Adding the query to the parent URL should be fine as this is still a valid URL.

For example, the following code runs fine under both node and tsx:

import { register } from "node:module";

register(
  new URL(
    `data:application/javascript;base64,${Buffer.from(
      `
    export async function resolve(specifier, context, nextResolve) {
      if (context.parentURL) {
        context.parentURL += "?some=query";
      }
      console.log("resolve", specifier, context);
      return nextResolve(specifier, context);
    }
`
    ).toString("base64")}`
  ),
  import.meta.url
);

await import("node:util");

and outputs:

resolve node:util {
  conditions: [ 'node', 'import', 'node-addons' ],
  importAssertions: {},
  parentURL: 'file:///Users/tim/Documents/repro/test.ts?some=query'
}

node:util?iitm=true is also a valid URL so I'm not fully convinced that's causing the issue yet!

timfish avatar May 14 '24 16:05 timfish

Changing that to just url: url.url makes your reproduction complete without error (though I have no idea what effect that has on import-in-the-middle actually working).

Ah, I just re-read this and it makes sense that this is the cause.

timfish avatar May 14 '24 16:05 timfish

I think you make have just realized this, but I'll finish this comment just so we're both on the same page. Your example is subtly different from what's happening in the failure case. In your example, parentURL is indeed still a valid file:// URL. However, in the failure case, it is not. This was easy to prove to myself:

const { fileURLToPath } = require('url');
fileURLToPath('node:util?foo=bar');

The above errors out exactly the same as the failure case with import-in-the-middle:

Uncaught TypeError [ERR_INVALID_URL_SCHEME]: The URL must be of scheme file
    at fileURLToPath (node:internal/url:1463:11) {
  code: 'ERR_INVALID_URL_SCHEME'
}

nwalters512 avatar May 14 '24 16:05 nwalters512

You're right, but fileURLToPath('node:util') also throws with the same error!

I think the issue isn't the query string, it's more that import-in-the-middle results in the parentURL being the node built-in which I guess this should never normally be possible because it's not a valid URL.

timfish avatar May 14 '24 17:05 timfish

For example, when the minimal reproduction is run through plain old Node:

import { register } from "node:module";

register("import-in-the-middle/hook.mjs", import.meta.url);
await import("node:util");

The iitm resolve function with this added:

    console.log('resolve', [specifier, context.parentURL])

Results in:

resolve [
  'node:util',
  'file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/test.js'
]
resolve [
  'file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/node_modules/import-in-the-middle/lib/register.js',
  'node:util?iitm=true'
]
resolve [ 
  'node:util?iitm=true', 
  'node:util?iitm=true' 
]

timfish avatar May 14 '24 17:05 timfish

All great observations! I managed to make a very isolated reproduction: https://github.com/nwalters512/register-hook-playground

It uses neither tsx nor import-in-the-middle; it copies the bare minimum amount of behavior from import-in-the-middle to be able to reproduce:

node:internal/modules/run_main:129
    triggerUncaughtException(
    ^
TypeError [ERR_INVALID_URL_SCHEME]: The URL must be of scheme file
    at fileURLToPath (node:internal/url:1463:11)
    at finalizeResolution (node:internal/modules/esm/resolve:266:42)
    at moduleResolve (node:internal/modules/esm/resolve:933:10)
    at defaultResolve (node:internal/modules/esm/resolve:1157:11)
    at nextResolve (node:internal/modules/esm/hooks:866:28)
    at resolve (file:///Users/nathan/git/register-hook-playground/hook2.mjs:4:21)
    at nextResolve (node:internal/modules/esm/hooks:866:28)
    at Hooks.resolve (node:internal/modules/esm/hooks:304:30)
    at MessagePort.handleMessage (node:internal/modules/esm/worker:196:24)
    at [nodejs.internal.kHybridDispatch] (node:internal/event_target:825:20) {
  code: 'ERR_INVALID_URL_SCHEME'
}

It turns on that the query params were a red herring of sorts. The presence of it actually triggers another behavior in import-in-the-middle that produces synthetic source code for the module being loaded (in this case, node:util?iitm=true):

https://github.com/DataDog/import-in-the-middle/blob/00b01fff1f5b69dd25e307593ec54d1d8abb4844/hook.js#L266

This puts us in a situation that Node would normally never expect to encounter: parentURL refers to a builtin (node:import). You can reproduce this even more simply with the following hook code (in my repro as hook-simpler.mjs):

export async function resolve(specifier, context, parentResolve) {
  specifier = 'file:///dost-not-exist.mjs';
  context.parentURL = 'node:util';
  const url = await parentResolve(specifier, context, parentResolve);
}

This is obviously very contrived, but it reproduces the problem very simply: Node's parentResolve chokes when specifier is a a file:// URL and context.parentURL is not a file:// URL. This wasn't reproducible in your above example because Node seems to short-circuit if specifier is a core module, it never even tries to use context.parentURL in that case.

I'm feeling pretty confident that this is a bug in import-in-the-middle, as that's the one that ends up generating synthetic code for node:util that ends up trying to import some file:// thing. Do you want me to move this issue there?

nwalters512 avatar May 14 '24 18:05 nwalters512

What I don't yet fully understand is why iitm works with Node as it's using the same parentURLs.

For example these are the logs from resolve in import-in-the-middle calling down to the default resolver:

defaultResolve [
  'node:util',
  {
    conditions: [ 'node', 'import', 'node-addons' ],
    importAttributes: {},
    parentURL: 'file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/test.js'
  }
]
result [Object: null prototype] { url: 'node:util' }
defaultResolve [
  'file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/node_modules/import-in-the-middle/lib/register.js',
  {
    conditions: [ 'node', 'import', 'node-addons' ],
    importAttributes: {},
    parentURL: 'node:util?iitm=true'
  }
]
result [Object: null prototype] {
  url: 'file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/node_modules/import-in-the-middle/lib/register.js',
  format: 'commonjs'
}
defaultResolve [
  'node:util?iitm=true',
  {
    conditions: [ 'node', 'import', 'node-addons' ],
    importAttributes: {},
    parentURL: 'node:util?iitm=true'
  }
]
result [Object: null prototype] { url: 'node:util' }

It appears to handle the node: URLs without error!

From the stack trace, we can see that when tsx is involved, it's called after iitm, ie. iitm > tsx > default.

However, my console logs from tsx resolve aren't being outputted before the exception so it's really hard to see if/how the parameters are being modified before they hit the default resolver.

My best guess is that they're not getting flushed from the loader hooks thread and there's no obvious way to debug it otherwise.

timfish avatar May 14 '24 19:05 timfish

I think I have an answer: this only happens when the file that we're trying to resolve with node:util as its parent does not exist! You can make the following change to hook.mjs in https://github.com/nwalters512/register-hook-playground to demonstrate this:

-  const iitmURL = new URL('lib/register.mjs', import.meta.url).toString();
+  const iitmURL = new URL('register.mjs', import.meta.url).toString();

Note that the updated version points to a file that actually exists; when that happens, there's no error!

So... maybe this is a bug in Node? If nothing else, it's unexpected behavior. I think he expected behavior in the does-not-exist case is that we'd get a ERR_MODULE_NOT_FOUND, not ERR_INVALID_URL_SCHEME. This can be demonstrated with the following hook:

export async function resolve(specifier, context, parentResolve) {
  await parentResolve('file:///does-not-exist.mjs', context, parentResolve);
}

Which errors with the following:

Error [ERR_MODULE_NOT_FOUND]: Cannot find module '/does-not-exist.mjs' imported from /Users/nathan/git/register-hook-playground/index.mjs
    at finalizeResolution (node:internal/modules/esm/resolve:265:11)
    at moduleResolve (node:internal/modules/esm/resolve:933:10)
    at defaultResolve (node:internal/modules/esm/resolve:1157:11)
    at nextResolve (node:internal/modules/esm/hooks:866:28)
    at resolve (file:///Users/nathan/git/register-hook-playground/hook.mjs:2:9)
    at nextResolve (node:internal/modules/esm/hooks:866:28)
    at Hooks.resolve (node:internal/modules/esm/hooks:304:30)
    at MessagePort.handleMessage (node:internal/modules/esm/worker:196:24)
    at [nodejs.internal.kHybridDispatch] (node:internal/event_target:825:20)
    at MessagePort.<anonymous> (node:internal/per_context/messageport:23:28) {
  code: 'ERR_MODULE_NOT_FOUND',
  url: 'file:///does-not-exist.mjs'
}

Aside: tsx actually relies on ERR_MODULE_NOT_FOUND being thrown so that it can tell when files don't exist.

nwalters512 avatar May 14 '24 19:05 nwalters512

So to summarize, the combination of tsx and import-in-the-middle means that we end in a situation with the following properties:

  • We're trying to resolve a file:///... URL
  • The file referenced by the file:/// URL does not exist
  • The context.parentURL is a node: (core) module

This is a perfect storm that produces unexpected behavior in Node's own resolution process.

nwalters512 avatar May 14 '24 19:05 nwalters512

Ah got it, the error is being thrown from here. It's throwing while it's trying to create the error message: https://github.com/nodejs/node/blob/87b87a8f6017545e2c9d6048d7b081f73c8a1072/lib/internal/modules/esm/resolve.js#L265

timfish avatar May 14 '24 19:05 timfish

Node tries to check if the parentURL is valid here, but it doesn't guard against node:* URLs and later just throws it into fileURLToPath to create the error message.

If we report this to Node they might want to throw on all node: parentURLs which would break import-in-the-middle 🤔

I'll open an issue there first and explain the detail...

timfish avatar May 14 '24 19:05 timfish

Excellent find! Sounds good; I'll check out the issue once you make it.

nwalters512 avatar May 14 '24 19:05 nwalters512

Link to the node issues: https://github.com/nodejs/node/issues/52987

timfish avatar May 14 '24 20:05 timfish

I opened a PR to fix this in import-in-the-middle: https://github.com/DataDog/import-in-the-middle/pull/76

nwalters512 avatar May 15 '24 16:05 nwalters512

Maybe it would be good idea for Sentry to fork iitm and launch it as their own package. DDs shipping cadedance has not been particularly frequent, and this is going to be a major bottleneck for ESM adoption.

lilouartz avatar May 20 '24 14:05 lilouartz

Maybe it would be good idea for Sentry to fork iitm and launch it as their own package

The plan is for it to move to the Node org: https://github.com/nodejs/admin/issues/858

timfish avatar May 21 '24 10:05 timfish

Maybe it would be good idea for Sentry to fork iitm and launch it as their own package

The plan is for it to move to the Node org: nodejs/admin#858

Node org is equally slow to make releases.

If this is going to be a bottleneck to Sentry v8 adoption, you want to have more control over it.

It is small enough package to make little difference even if there are multiple versions of it maintained, i.e. a copy can be incubated under Sentry and if Node.js wants to merge upstream, let them do it.

lilouartz avatar May 21 '24 13:05 lilouartz

Maybe it would be good idea for Sentry to fork iitm and launch it as their own package

The plan is for it to move to the Node org: nodejs/admin#858

Node org is equally slow to make releases.

If this is going to be a bottleneck to Sentry v8 adoption, you want to have more control over it.

It is small enough package to make little difference even if there are multiple versions of it maintained, i.e. a copy can be incubated under Sentry and if Node.js wants to merge upstream, let them do it.

Normally, I'd agree, it would be nice if we could fork it. But sadly, in this case we can't do this because import-in-the-middle is a dependency of all the other opentelemetry instrumentation, which would not use our fork :( So we need to make fixes upstream there and live with the slower cadence 😬

mydea avatar May 21 '24 14:05 mydea

I might have found a temporary workaround...

@nwalters512 Does this work as a temporary fix to get tsx working for your use case?

import { register } from "node:module";
import * as Sentry from "@sentry/node";

// register a loader hook before import-in-the-middle that 
// ensures parentURL is a valid file URL before it hits Node
register(
  new URL(
    `data:application/javascript,
  export async function resolve(specifier, context, nextResolve) {
    if(context.parentURL.startsWith('node:')) { context.parentURL = 'file:///'; }
    return nextResolve(specifier, context);
  }`
  ),
  import.meta.url
);

Sentry.init({ dsn: '__DSN__' });

await import("node:util");

This doesn't result in any errors in this basic example.

timfish avatar May 21 '24 19:05 timfish

@timfish what ultimately fixed things for me was actually https://github.com/getsentry/sentry-javascript/pull/12043; we don't use performance instrumentation, so the error went away when Sentry stopped trying to unnecessarily instrument modules. This is of course not a general solution, and yours might very well be a reasonable bandaid for anyone who is using both tsx and Sentry tracing while we wait for import-in-the-middle maintainers to do their thing!

nwalters512 avatar May 21 '24 20:05 nwalters512

There are multiple outstanding PRs for import-in-the-middle (including the one from @nwalters512 🙏) that combined hopefully fix a wide range of issues.

Until they make it to a release, I've combined these into a patch that can be used with patch-package: https://github.com/getsentry/sentry-javascript/issues/12242#issuecomment-2133993135

If anyone can confirm that this fixes their issues that would be great!

timfish avatar May 24 '24 10:05 timfish

Does the fact that this PR is closed means that now Sentry is compatible with tsx?

lilouartz avatar Jun 06 '24 15:06 lilouartz

This issue is still open https://github.com/getsentry/sentry-javascript/issues/12357, but this specific issue is fixed.

Just needs to be released - that will come out tomorrow most likely.

AbhiPrasad avatar Jun 06 '24 16:06 AbhiPrasad

Hello, we've just released v8.8.0, which should hopefully resolve this ESM problem. Let us know if you updated and are still running into problems!

mydea avatar Jun 07 '24 08:06 mydea

This issue might be fixed, but unfortunately tsx always adds it's loaders before others passed via --import so it's still not compatible: https://github.com/getsentry/sentry-javascript/issues/12357 https://github.com/privatenumber/tsx/issues/571

import-in-the-middle parses the loaded code to find the imports and with the current order of the loaders, import-in-the-middle is passed TypeScript code which it cannot parse.

timfish avatar Jun 07 '24 11:06 timfish