[RRFC] Only use registry provided URLs to fetch tarballs
Motivation ("The Why")
npm 7 introduced a change where it computes tarball URLs from the registry name instead of using the URL provided by the registry. This causes third party registries that don't host their own tarballs to fail.
How
Current Behaviour
Tarball URLs are rewritten to one's that are not guaranteed to match the one's that are registry provided nor in the lockfile. This amounted to a substantial change to the npm protocol.
Desired Behaviour
This seems to have been implemented so that when you change your default registry, npm would download tarballs from the new registry regardless of the lockfile. This is clearly desirable behavior, but the assumptions about how third party registries store their tarballs are not always correct.
I would propose that:
A) The registry config in use when a lock-file is generated should be recorded to the lockfile. B) When the currently configured registr(y|ies) changes (determined by comparing config to lockfile), npm will refresh tarball URLs by querying metadata from the new registry. This would allow for the new npm7 functionality without breaking third party registries that don't host their tarballs the same way that npmjs.org does.
Open Questions
- When npm finds a registry change or new tarball URLs should it update the lockfile?
- Naively, the answer feels like yes, but my impression is that the core use case of this feature is for environments where the same lockfile is used both in public environments (with npmjs.org) and private one's (with a private registry). If so, updating the lockfile every time would not be desirable.ee
References
- Details and a reproduction of the behavior are available in bug report #3530.
Initial thoughts (we'll talk it through more in the meeting today, so any/all of this subject to change, of course). I'm definitely leaning in favor of this proposal.
- Tracking the registry in use in the lockfile is an extremely good idea. I'm surprised no one suggested it before now, but I guess most good ideas look obvious once you see them 😅 I think this can be done as a non-breaking change to the registry format, not sure if it's worthy of bumping the
lockfileVersion(which is a pretty big change), but if not, it does mean we'll have to check for the existence of the field. (We already do this with therequires: trueflag, so it's not terrible if we go that way.) - Historical note, the
https://registry.npmjs.org/string has been slightly magical for quite some time, and gets replaced with the--registryconfig. We tried to change things to use a more obvious token that was clearly magical, like${REGISTRY}or something, but that broke a lot of stuff, and would have made lockfiles unusable by npm v6. I'm not sure we'll be able to de-magic it entirely, but this is a step in that direction, at least explicitly defining what the magic registry field was at the time the tree was resolved. - Really, the main user-facing behavior shift in this suggestion, for most users, would be some extra HTTP requests for packuments when you change your registry, but that's probably not going to impact performance too horribly. The internal complexity will be forking our "just grab the
resolvedfield, whatever it is" behavior. - The users who might be impacted by this (and the reason it is a breaking change) is if you have a passthrough proxy that doesn't modify the response body at all, and is used as the
--registryin an environment where the registry is not reachable. In those cases, it'll cause npm to try to fetch the tarballs from the dist.tarball url, which ishttps://registry.npmjs.org/..., rather going through the proxy. However, in that kind of case, what they ought to do is configure it as a true HTTP proxy, since npm has support for that. So, I'm fine with breaking that use case, but it is a breaking change that will affect someone, I'm sure.
When npm finds a registry change or new tarball URLs should it update the lockfile?
If --save is set, or for other reasons we'd be saving a lockfile, then I'd say yes, it definitely should. (For the same reason we update resolved, integrity, etc.)
In environments where you're running npm ci, or otherwise doing a no-save reification, we don't write the lockfile at all, so we shouldn't change it.
~~I would also note that the problematic rewriting behavior only happens with scoped modules. IMO at the VERY least it should not have different behaviors for scoped and unscoped modules.~~ (This was a bug in my repro. XD)
Also, lockfiles are not required -- the repro does not use a lockfile, it simply breaks basic installs.
In case you find it useful, we solved this problem in Yarn by specifying that "standard registry patterns" are of the form:
/<scope>/<name>/-/<name>-<version>.tgz
If the tarball in metadata matches this URL pattern, Yarn won't store the tarball path and will derive it automatically from whatever is the currently configured registry. If it doesn't, Yarn will store the full URL within the lockfile (which will break the "easy registry switch scenario"). We found this system of graceful degradation working fairly well for us.
Notably, I believe that won't match any extant Artifactory registry.
Indeed, but we didn't get requests to make it work nonetheless. Notably, a few issues have been raised by people using them, as evidenced by the logs they shared, but they never complained about it.
There's also a decent case that as the registry maintainers, you probably have more pull to standardize this behavior if you so decided (ref https://github.com/npm/feedback/discussions/55).
What are the next steps on this? Would love to help contribute in anyway I can. https://github.com/npm/cli/issues/3533 is still an incredibly infuriating problem in npm.