nft icon indicating copy to clipboard operation
nft copied to clipboard

fix(resolution): Fix resolution of querystringed imports in ESM files

Open lobsterkatie opened this issue 3 years ago • 13 comments

Currently, there are two problems with the way @vercel/nft handles ESM imports with querystrings (import * as something from './someFile?someQuerystring';):

  • It treats the ? as it would any other character. As a result, it fails to resolve such imports to the non-querystringed files to which they correspond. (In the above example, it looks for a file named someFile?someQuerystring.js and, as one would expect, doesn't find it.) While this is the correct behavior for require calls in CJS files, it doesn't match the way that Node handles ESM imports (which is to strip the querystring off before attempting to resolve the file).
  • In cases where a custom resolve method is provided, and that custom method does strip querystrings from ESM imports, the querystrings are then not put back on before the imported module is processed. This matters, for example, in cases where readFile is also overridden, with a method whose output is affected by the querystring's value. (Webpack is such a system, so one place this shows up is in Next.js's NextTraceEntryPointsPlugin.)

This fixes both of those problems by selectively stripping and re-adding the querystring in ESM imports at various points in the nodeFileTrace lifecycle. More specifically, in resolveDependecy and in emitDependency and its helpers, we strip the querystring (and/or don't add it back), with the exception of:

  • When we're marking a path as seen, we keep the querystring so that having a different querystring makes the process run again.
  • When we're calling readFile, we keep the querystring, since some implementations of readFile, including the one in the nft webpack plugin, read different modules out of memory with different querystrings).
  • When we're calling resolve, we add the querystring back in, since this is also something which can be supplied by the user, and and the user-supplied version might care about query strings.
  • When we're caching analysis results, we keep the querystring, since again here, we may have different code to analyze if we have a different querystring.
  • When we're making the recursive call to emitDependency, we add the querystring back in, since we need it there to be able to start the whole cycle over.

Notes:

  • Because the behavior here depends on whether or not a module is ESM, I needed to pass cjsResolve to the recursive call of emitDependency. Rather than change emitDependency's signature, I opted to make emitDependency a wrapper for an inner function (analyzeAndEmitDependency) with the updated signature. Recursive calls now go to the inner function.
  • For context, we on the @sentry/nextjs team are interested in this because we use a webpack loader to insert in the dependency tree a proxy module for each page, which allows us to automatically wrap users' exported functions (getServerSideProps and friends, API route handlers, etc). To do this, we replace the code from /path/to/pages/somePage.js with our proxy code, which includes an import from './somePage?__sentry_wrapped__'. When webpack then crawls our proxy module looking for dependencies, the querystring tricks it into thinking it hasn't yet dealt with that page, so it forwards it to our loader a second time rather than skipping it. When our loader sees the querystring, it knows the code replacement has already been done, and returns the original page code untouched, thereby preserving the continuity of the dependency tree. This fix ensures that @vercel/nft successfully traces the modified tree.
  • The unit test testing that this doesn't break the (already correct) behavior in CJS when dealing with files with question marks in their names (cjs-querystring) required a little gymnastics in order to not break CI on Windows. It turns out that skipping the test on Windows wasn't enough - the mere fact that a file exists with a ? in its name is enough to cause Windows not to even be able to check out the commit. Therefore, the file is stored without any punctuation in its name, and before the tests run on non-Windows platforms, it is copied to a version which does have the punctuation, and which is git-ignored to enforce it not ending up in the repo in the future by accident. The dummy, no-punctuation version is stored in a folder in order to differentiate its path from that of the punctuated version by more than just the punctuation. That way, if the punctuated version is found, it's clear it's not just because the punctuation is somehow being ignored.

Fixes https://github.com/getsentry/sentry-javascript/issues/5998. Fixes https://github.com/vercel/nft/issues/319.

lobsterkatie avatar Nov 03 '22 04:11 lobsterkatie

@ijjk or @styfle, I'm wondering if you can give me some guidance on how to write tests for this. I think I probably need to add some webpack wrapper unit tests, but a) I'm not sure that's right, and b) it's not clear how those wrappers are generated.

lobsterkatie avatar Nov 03 '22 04:11 lobsterkatie

Codecov Report

Merging #322 (3eb9da4) into main (87a4849) will increase coverage by 0.01%. The diff coverage is 85.71%.

:exclamation: Current head 3eb9da4 differs from pull request most recent head a7c4822. Consider uploading reports for the commit a7c4822 to get more accurate results

@@            Coverage Diff             @@
##             main     #322      +/-   ##
==========================================
+ Coverage   81.23%   81.25%   +0.01%     
==========================================
  Files          13       13              
  Lines        1503     1520      +17     
  Branches      553      558       +5     
==========================================
+ Hits         1221     1235      +14     
- Misses        115      118       +3     
  Partials      167      167              
Impacted Files Coverage Δ
src/node-file-trace.ts 89.66% <84.84%> (-0.64%) :arrow_down:
src/resolve-dependency.ts 70.55% <100.00%> (+0.36%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Nov 03 '22 13:11 codecov[bot]

Won't we accidentally skip adding the query string back if the query string itself is included in the path? e.g: ./fruit/apples?fruit

Good catch, @lforst! I've made the check more robust.

lobsterkatie avatar Nov 08 '22 03:11 lobsterkatie

Is there a scenario where a file can contain a ? and dropping it would fail resolution?

I know that CJS needs to keep supporting it but I'm wondering if we need proper URL parsing to match how ESM works.

That's an edge case I hadn't thought of. I did a bunch of testing (details below) and here's what I found:

  • When resolving files, Node itself seems to handle CJS files by treating question marks like any other character (essentially, it ignores them), and treat ESM files by stripping everything after the first question mark. This means that with CJS, question marks in filenames are okay, but querystrings aren't (because it thinks the querystring is part of a filename). For ESM, it thinks question marks in filenames are the start of a querystring, so it can't handle them, but it can handle querystrings.
  • Current (before this PR) nft behavior seems to be to ignore question marks/treat them like a regular character, regardless of the CJS/ESM question. (IOW, it acts as if everything is CJS.) It therefore can handle files with question marks but can't handle querystrings because it's looking for files whose names include the querystring.
  • Webpack can't seem to handle question marks in filenames, whether we're using CJS or ESM.
  • The nextjs nft webpack plugin strips everything after the first question mark, again regardless of CJS vs ESM. (IOW, it acts as if everything is ESM.)

Or, put visually:

                                  | `?` in filename | Querystring | Notes                                                                                                                                                |
|---------------------------------|-----------------|-------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
| Node CJS                        | ✅              | ❌          | Querystring doesn't work because it looks for filename with querystring in it                                                                        |
| Node ESM                        | ❌              | ✅          | `?` in filename doesn't work because everything afterwards is considered a querystring                                                               |
| Webpack CJS                     | ❌              | ✅          | Uses https://github.com/webpack/enhanced-resolve/blob/651e57863e46afeccf1946b4275d7b2339152ed1/lib/util/identifier.js#L14. which splits at first `?` |
| Webpack ESM                     | ❌              | ✅          | Uses https://github.com/webpack/enhanced-resolve/blob/651e57863e46afeccf1946b4275d7b2339152ed1/lib/util/identifier.js#L14. which splits at first `?` |
| bare `nft` CJS                  | ✅              | ❌          | Querystring doesn't work because it looks for filename with querystring in it                                                                        |
| bare `nft` ESM                  | ✅              | ❌          | Querystring doesn't work because it looks for filename with querystring in it                                                                        |
| Nextjs `nft` Webpack plugin CJS | ❌              | ✅          | Same as webpack because it uses webpack's resolver                                                                                                   |
| Nextjs `nft` Webpack plugin ESM | ❌              | ✅          | Same as webpack because it uses webpack's resolver                                                                                                   |

So... I'm not sure where that leaves me in terms of what should be done here. Which behavior should I be trying to emulate?

Details of my testing

To test out the possibilities, I ran a few tests using the following files, along with a corresponding set of .mjs files using ESM imports and exports:

file1.js

const file2 = require("./file2?aardvark");
console.log(file2.bear);

file2.js

const file3 = require("./file3");
console.log(file3.cheetah);

module.exports = { bear: "dinosaur" };

file3.js

const file4 = require("./file?4");
console.log(file4.emu);

module.exports = { cheetah: "fox" };

file?4.js

module.exports = { emu: "giraffe" };
  • Just looking at node module resolution, I'm not sure if this is what you were talking about, @styfle, but I found that for CJS, node couldn't resolve ./file2?aardvark, but could resolve ./file?4. For ESM, it was the reverse. Interestingly, I had to import from ./file2.mjs?aardvark and ./file?4.mjs rather than just the module name (and I had to name the files .mjs instead of just .js). Without the extension, it correctly stripped the ?aardvark but then couldn't find the file, and even with the extension, it stripped the ?4.mjs and then couldn't find a module called file. The same thing happened when I tried to import from ./file?4.mjs?heron and ./file?4?heron.

    Conclusion: In node, it seems CJS treats question marks like any other character (essentially ignores them), and ESM strips everything after the first question mark.

  • I then moved onto looking at nft's current behavior. I couldn't get nft why to work at all (it always just printed out the second argument), but when I tried nft print, with both CJS and ESM, it matched the node CJS behavior (couldn't handle the file2 querystring, could handle the question mark in the file?4 filename). Here ESM needed the .mjs if that's what the files were called, but could deal with bare module names if the files were just .js. With ESM each file printed out twice, which makes sense given that it's printing both the main file list and the ESM file list.

    Conclusion: Current nft seems to ignore question marks (treats them like a regular character), regardless of the CJS/ESM question.

  • Then onto the nextjs NFT plugin. Just to keep us on our toes, with CJS it mimicked the ESM behavior of node (handled the querystring but not the question mark in the filename), though in fact in that case I don't think it got as far as the plugin - I think webpack itself couldn't handle it. Also, in that case it wasn't stripping off everything after the question mark, it just couldn't handle having the question mark in the filename. The same thing happened with ESM.

    Conclusion: Webpack can't deal with question marks in filenames.

  • Finally, URL parsing. I used node url.parse method to parse /aardvark/bear/cheetah?dinosaur, /aardvark/bear/cheetah?dinosaur?emu, /aardvark/bear/cheetah.js?dinosaur, /aardvark/bear/cheetah.js?dinosaur?emu, /aardvark/bear/chee?tah, /aardvark/bear/chee?tah.js, /aardvark/bear/chee?tah?dinosaur, and /aardvark/bear/che?etah.js?dinosaur, and in every case, everything after the first question mark was considered querystring.

    Conclusion: Using url.parse(filepath) is the same as doing filepath.split("?")[1] if there's only one ? but not if there's more than one. I'll switch to parsing.

lobsterkatie avatar Nov 08 '22 07:11 lobsterkatie

See a previous PR for how to add a test: #289

  • input.js is the input file that nft traces
  • output.js is the expected files to be traced
  • any other file can be required/imported by input.js or others

Thanks. And yeah, I'd gotten the basic idea of input.js and output.js. I'm still wondering about this:

I think I probably need to add some webpack wrapper unit tests, but a) I'm not sure that's right, and b) it's not clear how those wrappers are generated.

I'm also now thinking that this should be tested outside of a webpack context as well. I'll get started on that in the meantime.

lobsterkatie avatar Nov 08 '22 07:11 lobsterkatie

A few more questions on tests:

  1. I can see that all of the output.js files have relative paths, but the unit test I added keeps producing absolute paths ("Users/Katie/Documents/Sentry/forks/nft-fork/package.json" rather than just "package.json"). How do I get it to ignore the first part of the path?

  2. On second thought, does it make any sense to be testing this outside of the context of webpack, given that that's why an import would have a query string (and given that bare nft right now can't handle querystrings in imports)?

  3. As for the test using webpack, if I were creating testsfrom scratch, I'd probably do two: one using a simplified version of our proxy loader to set up the situation where you have both ./someFile and ./someFile?someQuerystring pointing to two different modules in memory, and one using another loader setup to change the contents of a module based on a hard-coded query string. I'd then probably use a simple plugin (similar to the nextjs nft plugin) as a way to run nft, and write a script to run webpack and check the output of the plugin.

I'm not at all sure how to translate that into either a unit test or an integration test which will fit with your system, though. As I mentioned above, I do see the webpack wrapper tests in the unit test folder, but they seem to be testing how nft handles webpack-generated code (after the fact), whereas I claim I want to test how nft behaves as a part of webpack build (during the fact).

Thoughts?

lobsterkatie avatar Nov 08 '22 08:11 lobsterkatie

The purpose of nft is to simulate Node.js module resolution.

The existing webpack tests are trying to reverse the wrappers that webpack adds back to the original source code the author wrote. In particular, require('path') is mangled into dynamic code like path__WEBPACK_IMPORTED_MODULE_0___default() which is no longer statically analyzable.

I think this PR should ignore anything that webpack does and focus solely on Node.js module resolution behavior.

styfle avatar Nov 08 '22 13:11 styfle

The purpose of nft is to simulate Node.js module resolution.

Right, okay. I think I've now got it so that it does. If we're dealing with ESM, we strip the querystring (if any) in resolveDependecy and in emitDependency and its helpers, with the exception of

  • when we're marking a path as seen (so that having a different querystring makes it go through the process again),
  • when we're calling readFile (since readFile can be changed with an option and in some implementations, including the one in the nft webpack plugin, read different modules out of memory with different querystrings),
  • when we're calling resolve (since this is also something which can be supplied by the user, and it might care about query strings),
  • when we're caching analysis results (since, again, we may have different code to analyze if we have a different querystring), and
  • when we're making the recursive call to emitDependency (or now its helper, since we need it there to be able to start the whole cycle over).

I think this PR should ignore anything that webpack does and focus solely on Node.js module resolution behavior.

Yeah, that's fair. I think I can probably (?) mock the relevant thing it does, which is distinguish between modules if the querystring is different. The other tests should hopefully be more straightforward. UPDATE: Done.

lobsterkatie avatar Nov 15 '22 06:11 lobsterkatie

UPDATE: Tests are now pushed and I've updated the PR description to reflect the results of our conversations above, so I will mark this ready for review. Please let me know if there's anything I missed!

lobsterkatie avatar Nov 17 '22 04:11 lobsterkatie

Hey, @styfle - I fixed the windows CI issues. Mind taking another look?

lobsterkatie avatar Nov 22 '22 03:11 lobsterkatie

Hey, @styfle - I know it's a crazy time of year, would love to get this in before the year's out, if possible, as it's blocking some of our customers. I'll be AFK most of this week (early family celebrations) and then back the week after. Cheers!

lobsterkatie avatar Dec 12 '22 15:12 lobsterkatie

Is this still needed? https://github.com/vercel/nft/pull/336#issuecomment-1443925199

styfle avatar Feb 24 '23 21:02 styfle

Is this still needed? #336 (comment)

Not from our side (Sentry SDK). I'll of course leave it up to you if you still want to implement this for the sake of consistency.

lforst avatar Feb 24 '23 21:02 lforst