typescript-go icon indicating copy to clipboard operation
typescript-go copied to clipboard

Deduplicate doubly-installed packages

Open benjaminjkraft opened this issue 6 months ago • 10 comments

We have some package (in our case @sentry/browser) installed two places in our tree (in our case, two different npm workspaces which npm in its infinite wisdom did not figure out how to deduplicate, although I think this can come about from dependencies-of-dependencies as well). This package exports a class with a protected method. Assigning an instance of one to the other works in TS, but fails in Go, because I guess it didn't deduplicate the package like TS does. (There are also other cases where deduplicating affects behavior, mostly in very complex types, and probably many more where it affects performance.)

Let me know if more details are helpful to repro, it's a bit more ceremony to construct than most repros and it sounded like @jakebailey already knew enough to be aware of the problem, but I can set something up if it's more of an edge case and you need one.

benjaminjkraft avatar Jun 06 '25 03:06 benjaminjkraft

This is not a new issue for ts-go though is it ? Its present in the original typescript?

lukeapage avatar Jun 06 '25 04:06 lukeapage

At least in our configuration in this particular case it isn't! I think the behavior in question is the one referenced here. Although it's possible there's some sort of moduleResolution cruft involved here as well (we are still cleaning some of it up).

benjaminjkraft avatar Jun 06 '25 04:06 benjaminjkraft

I found the code to do this in Strada.

https://github.com/microsoft/TypeScript/blob/e6a50a78619632bba7af4f23b2e7e4940b9cf788/src/compiler/program.ts#L3671-3690

        if (packageId) {
            const packageIdKey = packageIdToString(packageId);
            const fileFromPackageId = packageIdToSourceFile.get(packageIdKey);
            if (fileFromPackageId) {
                // Some other SourceFile already exists with this package name and version.
                // Instead of creating a duplicate, just redirect to the existing one.
                const dupFile = createRedirectedSourceFile(fileFromPackageId, file!, fileName, path, toPath(fileName), originalFileName, sourceFileOptions);
                redirectTargetsMap.add(fileFromPackageId.path, fileName);
                addFileToFilesByName(dupFile, path, fileName, redirectedPath);
                addFileIncludeReason(dupFile, reason, /*checkExisting*/ false);
                sourceFileToPackageName.set(path, packageIdToPackageName(packageId));
                processingOtherFiles!.push(dupFile);
                return dupFile;
            }
            else if (file) {
                // This is the first source file to have this packageId.
                packageIdToSourceFile.set(packageIdKey, file);
                sourceFileToPackageName.set(path, packageIdToPackageName(packageId));
            }
        }

I am at a loss for how to do this here; this chooses which file is the winner based on which one is found first, but the order in which we find files is now undefined thanks to us loading files concurrently.

We could choose the winner based on some other criteria (path length?), except again, we are concurrent, so we're likely going to have already gone and attempted to load everything that the the "losing" file referenced, and that work is inseparable from the rest of the file graph by then...

jakebailey avatar Jun 25 '25 00:06 jakebailey

At best, we could defer all of this work to a second pass, but that would change the file ordering from Strada too. Perhaps that's an acceptable tradeoff, but means that we would probably defer most file resolutions until a second/third/... pass.

I really wish we didn't have this feature. ☹

jakebailey avatar Jun 25 '25 00:06 jakebailey

Let me know if you have a POC PR you want me to run against our repo, we have a lot of dependency hell 🙃. I'm also happy to share more of the specific cases we hit if it's helpful context for deciding how to proceed, I'm not sure I want to try to isolate them out of our codebase but in non-reproducible sketches or privately I could share more.

My guess is as long as it's deterministic this is no worse than the union ordering stuff, with deduplication not existing we had only 5-10 packages have issues across our whole repo, and ~half of those we were being a bit careless anyway (e.g. some of them were genuine cases of having two different versions in the tree and expecting them to talk to each other; deduplicating was papering over potentially real incompatibilities). But it is a bit confusing when it happens, especially the protected methods case where you might want it to just work. And sadly anybody using npm workspaces, especially, is gonna find themselves in this state. (The whole exercise is really pushing us to get off npm.)

benjaminjkraft avatar Jun 25 '25 01:06 benjaminjkraft

I don't have anything yet, though a self contained repro would be useful.

jakebailey avatar Jun 25 '25 01:06 jakebailey

I guess a "good thing" is that whatever we do cannot be worse than the current situation, given we already load all files without deduplication and people are happy with the perf.

jakebailey avatar Jun 25 '25 17:06 jakebailey

Adding a data point here: the new behavior of not deduplicating actually catches real-world errors where “nominality” or something like it matters. In working through Vanta’s codebase, I found a couple cases where we have multiple resolutions of transitive dependencies that depend on symbol checks at runtime, where the types end up like this:

declare const typeId: unique symbol;

declare type TheType = {
  [typeId]: true
  // ...
};

declare function isTheType(something: unknown): something is TheType;

And the isTheType helper relies on the actual typeId symbol. If you have two copies of this in your transitive dependencies (even if that is IMO extremely undesirable, it can happen for a variety of reasons), saying that copy1.TheType and copy2.TheType are mutually assignable is quite wrong and could produce nasty runtime errors. In the specific case where I’m seeing it (Vercel’s AI SDK) I strongly suspect it’s “fine” in practice, but it’s dangerous, and the new behavior correctly identifies a real and serious hazard.

To pick just one failure mode that came to mind immedaitely: If you have singleton module state, for example, that is gated on a check like this, you could end up with very bad behavior.

Net, I’d very much rather this new behavior be maintained and consumers be given tools to migrate. Wild spitballing: an opt-in flag in Strada which warns (or, depending on compiler flags, errors) on it?

chriskrycho avatar Oct 28 '25 16:10 chriskrycho

https://github.com/microsoft/TypeScript/pull/62684 shows that not doing this only breaks one codebase out of the top 800 TS repos, which is a lot better than I expected.

To implement this, I can only imagine us doing a normal load, then introducing a final extra pass that does the deduping. I suspect that would be "fine" because if the packages are identical, their deps should also be the same so it's not the worst to leave them in the program...

But, I would certainly rather not introduce this idea back into the project. Would be so much easier...

jakebailey avatar Oct 28 '25 21:10 jakebailey

For those with this issue, can you try building https://github.com/microsoft/typescript-go/pull/2361 from source to see if it fixes your issues?

I am not happy with this PR, but I want to see if I'm on the right track.

jakebailey avatar Dec 12 '25 05:12 jakebailey

For those with this issue, can you try building #2361 from source to see if it fixes your issues?

I am not happy with this PR, but I want to see if I'm on the right track.

Unfortunately, no luck with the reproduction I shared: https://github.com/microsoft/typescript-go/issues/1906

 /Users/ben/Development/typescript-go/built/local/tsgo --build
src/index.ts:7:13 - error TS2345: Argument of type 'import("/Users/ben/Development/tsgo-bug-1/pkg2/node_modules/.pnpm/[email protected]/node_modules/kysely/dist/esm/kysely").Kysely<any>' is not assignable to parameter of type 'import("/Users/ben/Development/tsgo-bug-1/pkg1/node_modules/.pnpm/[email protected]/node_modules/kysely/dist/esm/kysely").Kysely<any>'.
  Property '#private' in type 'Kysely' refers to a different member that cannot be accessed from within type 'Kysely'.

7 wantsKyself(db);
              ~~


Found 1 error in src/index.ts:7

Benjamin-Dobell avatar Dec 14 '25 05:12 Benjamin-Dobell

Yeah, I guess that is the sort of problem that cannot be resolved by my attempt, because the symbols themselves are not deduped, I've just hacked the checker to avoid some of the errors, but that will not solve class nominality.

jakebailey avatar Dec 15 '25 17:12 jakebailey

@Benjamin-Dobell Can you try #2361 again? With the latest version, your example appears to work for me.

jakebailey avatar Dec 15 '25 19:12 jakebailey

@Benjamin-Dobell Can you try #2361 again? With the latest version, your example appears to work for me.

It works! 🎉

This is hugely appreciated. Thank you so much for implementing this!

I have a monstrosity of a TypeScript project (a game built with GodotJS) and typescript-go makes a world of difference during development.

Benjamin-Dobell avatar Dec 16 '25 03:12 Benjamin-Dobell