Pkg.jl icon indicating copy to clipboard operation
Pkg.jl copied to clipboard

Dependency confusion between internal registries and General

Open Seelengrab opened this issue 3 years ago • 41 comments

A recent, novel supply chain attack on some package managers is also possible in certain Pkg/Registry configurations.

The gist of it is that some package managers, when given a package name, by default look in "internal" repos first, then also check the "public" repos and install whichever returns a higher version number. For this attack to be successful in Pkg, the attacker would also have to know the UUID of the internal package and register a package in General with both the same name and the same UUID, but a higher version number (e.g. 9001.0.0). Once registered, Pkg installs whichever version is higher, thereby allowing "shadowing" of the internal package with a malicious package.

A MWE can be found at https://github.com/DilumAluthge/MWE_multiple_registries_same_package_uuid.

The intention for all possible fixes is to preserve the ability to have multiple registries available to provide the same package. This should not allow attackers to intentionally register packages with the same name & UUID as another package in a different registry and mislead people into downloading their malicious package.


A non-breaking fix is for each private registry user who also uses General to use the 3 day waiting period to monitor for clashes in new package registrations to General. This should be automatable with some tooling, which comments on the PR to General and thus stops the automerge. As a precaution, private registry users may want to create new UUIDs for their internal packages and investigate how the UUID leaked in the first place.

Another non-breaking fix on a registry-per-registry basis would be to mirror & vet General manually, though this is somewhat high maintenance and thus unlikely to be useful in practice. This would also require some investigation into how the UUID leaked, should a mismatch be detected.

A possible long-term fix would involve a new shadowable entry in Package.toml, which would be opt-in and signal that the package is allowed to come from other registries as well. In this model, all installed registries that have the same combination of (name, UUID) would also need to have shadowable=true set for that package. If any registry doesn't have this set, we error.

This would be a breaking change, so our options are:

  1. Do it in Julia 2.0.
  2. Do it in Julia 1.x, but make a lot of Slack posts and Discourse posts informing people of the change, and make it easy for people (JC, Invenia, Beacon, etc) to make the changes needed. This would only affect local/internal registries that have shared a package with other registries (e.g. by open sourcing them to General). We should work closely with those who are known to have opensourced packages to make their transition as easy as possible.

Seelengrab avatar Feb 10 '21 11:02 Seelengrab

Many thanks to @ericphanson and @DilumAluthge for brain storming both short- and long-term fixes to this!

Seelengrab avatar Feb 10 '21 11:02 Seelengrab

Another complication is the case when two intentionally public registries (e.g. General and HolyLabRegistry) are able to shadow each other. This could be prevented by specifying which registries exactly are allowed to shadow the package (or rather, which packages are allowed to be trusted for a specific package). These lists would have to agree on all registries, otherwise we error.

I'm not 100% sure about this, because it requires all registries to be updated & maintained at roughly the same time. Kind of feels like a mismatch is bound to happen eventually here :/

Seelengrab avatar Feb 10 '21 17:02 Seelengrab

I think they could just both put shadowable=true, and then opt-out of this safety check. I don't think that case is really an issue because both packages are controlled by the same maintainers.

I think the security issue is only when PkgX is in RegistryA but not RegistryB, and users of PkgX use both registries, and an adversary has the ability to register a package with the same name and UUID in RegistryB. I think for most practical purposes the only possible RegistryB is General, since it has publicly-available automerge and is installed by default on all Julia installations.

Therefore, if PkgX is already in General, there isn't really a security issue and one can just use the opt-out.

edit: deleted paragraph saying exactly the same thing as you @seelengrab about using a list of registries :). I don't think that's really needed at this point and adds to the burden like you said.

ericphanson avatar Feb 10 '21 20:02 ericphanson

The issue of shadowing packages from other public registries with a new package in General could also be mitigated by having auto-merge block registration of UUIDs that exist in public registries it has been configured to know about.

GunnarFarneback avatar Feb 10 '21 21:02 GunnarFarneback

A possible long-term fix would involve a new shadowable entry in Package.toml, which would be opt-in and signal that the package is allowed to come from other registries as well. In this model, all installed registries that have the same combination of (name, UUID) would also need to have shadowable=true set for that package. If any registry doesn't have this set, we error.

A non-breaking variation of this would be to only disallow merging if Registry.toml contains a field with that meaning, and have an option to allow merging on a package basis by a mergable entry in Package.toml. Possibly this could also specify exactly which registries to allow merging with. Thus General would not make use of this feature and be completely unaffected but you could set it in your private registry to make sure that your packages cannot be shadowed by General. And in case you do make some of your packages public, there's an override mechanism to allow those to be merged with General.

GunnarFarneback avatar Feb 11 '21 08:02 GunnarFarneback

I had thought through letting private registries publish a salted, hashed list of UUIDs that could be checked for collisions, but there's a bit of a problem with that: how do you distinguish the original author taking their own package open source from someone else trying to hijack their private package UUID? One answer could be that we examine the situation manually and make a judgement. Otherwise it seems like there needs to be a way of proving that you where the party that submitted the original salted and hashed entry, which gets into tricky crypto territory. Not impossible, but not simple.

Moreover, once package authors need to have proof that they "own" a UUID — in the above scenario, just to be able to take it public at some point — then why bother with the rest? If authors have a private key, they can just sign each release's hash and those signatures can be checked with the private key that's in the registry. If the public keys in different registries don't match, then the client can refuse to install.

You would want a way for authorities that you trust to sign versions with other keys so that your local admins can publish hotfix versions of public packages, but that can also be arranged.

StefanKarpinski avatar Apr 01 '21 20:04 StefanKarpinski

In the public version of this that is implemented in AutoMerge, the rule is to allow registration of a protected UUID if name and repo matches, on the assumption that you can't effectively hijack a package if you still have to point it to the original author's repo.

Wasn't the design for the hashed version similar in that respect?

GunnarFarneback avatar Apr 01 '21 21:04 GunnarFarneback

Right, but what if you need to change the URL? It's all nice in theory to think that URLs are forever but we know that in reality they are not. If a new hashed record is submitted, how do we decide whether it's ok to let it replace the old one?

StefanKarpinski avatar Apr 01 '21 21:04 StefanKarpinski

Fundamentally we have the same problem with repo changes of packages that have already been registered in General, except of course that there is more data to make a judgement call from. The easy way out, with its own problems, is to require someone who has protected a UUID in this way and want to open source the package, to do that with a new UUID.

GunnarFarneback avatar Apr 02 '21 07:04 GunnarFarneback

That's a bitter pill to swallow given that the current system has made it so smooth and easy for people to open source their private packages and keep using them without issues. And that's something we really want to encourage.

StefanKarpinski avatar Apr 02 '21 20:04 StefanKarpinski

We don't have to do something mechanical and rigid here: if an organization has previously published a hashed list of UUIDs, is taking something open source and wants to change the URL, we can always evaluated it using human judgement — the only thing that needs to be automatic is the rejection of attack attempts. But that approach does mean that we need to be able to identify what's going on with the hash lists in order to be able to make judgements about whether a URL change should be allowed or not.


Unrelated, but here's my high-level thinking about this problem. Fundamentally this is about who each person trusts to publish new versions of various packages. One generally trusts the original author, so when they make a new release, we're happy to upgrade to it. We also sometimes want to trust some other entity like our own organization's sysadmins to publish new versions of packages they don't maintain for hotfixes and the like. But that should be a conscious choice on the part of the user or a preconfigured policy on corporate machines.

StefanKarpinski avatar Apr 02 '21 20:04 StefanKarpinski

A non-breaking fix is for each private registry user who also uses General to use the 3 day waiting period to monitor for clashes in new package registrations to General. This should be automatable with some tooling, which comments on the PR to General and thus stops the automerge.

Are there any tools available for automating these monitoring checks?

cossio avatar Apr 09 '22 11:04 cossio

Not that I've heard of. A simpler but less proactive approach is to periodically (e.g. with a scheduled job) check for UUID collisions between General and your own registry and raise an internal alarm if one is found.

If your registry is public you can make a PR to add your registry to https://github.com/JuliaRegistries/General/blob/736de1456b8ce65a24ed0003835d370f06451f13/.github/workflows/automerge.yml#L84 so that collisions are stopped by AutoMerge. I'm not sure if that's documented somewhere.

GunnarFarneback avatar Apr 09 '22 11:04 GunnarFarneback

What if we have a list of "trusted" / "untrusted" registries for each package?

When a package is installed for the first time, it's registry of origin (call it registry A) is added as trusted and updates coming from this registry can proceed automatically.

If a new version of this package shows up in another registry (B), Pkg prompts the user what to do. If the user selects to upgrade from B, then registry B is also added as trusted for this package. After that the behavior of Pkg can be the current one regarding the rules for when two registries contain the same package. However, if the user selects not to "trust" the new version in B, then Pkg adds B as untrusted for this package, and does not consider updates coming from B for this package anymore, only updating the package if new versions appear in A.

Could something like this work?

cossio avatar Apr 09 '22 19:04 cossio

I think a package “should” have only one trusted group allowed to issue versions (org/committer/company/whatever), so if there’s any “untrusted” versions showing up in a registry that’s a big security issue that should be resolved at the registry level, by yanking those versions and investigating, not by someone’s client just ignoring them. I think some kind of alerting thing is good but I think the way we act on that alert shouldn’t be “ok for this particular user, they don’t want these versions”.

ericphanson avatar Apr 09 '22 20:04 ericphanson

I think there's two things you need here:

  1. Allow a package to declare that certain registries are trusted for it.
  2. Allow a user to declare that certain registries are trusted for them.

Why do you need both of these? The first is what has already been suggested, and it prevents someone injecting an untrusted version of a package in some other registry, e.g. a malicious version of a private package in the general public registry.

So why do you need the other? Because it's useful for private registries to be able to release hot-fixes or modified versions of packages in other registries, but they need to trust those registries to do this for them.

There's a question of how to bootstrap these. The first registry that someone gets a package appears in can be implicitly trusted for it—otherwise we'd need to manually start the trust list somewhere. If a new trust declaration appears in a trusted registry, that can be trusted as well. That would allow transferring a package from one registry to another by adding the destination registry to the trust list and then doing the transfer. I think that not having an explicit trust list for a package in a registry should probably be equivalent to a trust list containing only that registry. So you could do a package transfer like this:

  1. Initially RegistryA has an entry for PkgX with no explicit trust list, which means only RegistryA is trusted. RegistryB has no entry for PkgX.
  2. Add an explicit trust = [RegistryA, RegistryB] entry in the Package.toml file for RegistryA (these are registry UUIDs).
  3. Copy the PkgX directory from RegistryA to RegistryB. Now new versions from either registry will be trusted.
  4. Delete the PkgX directory from RegistryA.
  5. Delete the trust = [RegistryA, RegistryB] entry in PkgX/Package.toml from RegistryB, leaving only RegistryA in the implicit trust list for PkgX.

If you want two registries to both be allowed to publish versions of a package, you can just leave both packages in the middle state (step 3) indefinitely. This could be expanded to any number of registries.

Allowing users to declare that they trust certain registries to release versions of packages is the other design question here. We could potentially prompt for that if a new version of a package appears in a registry that isn't in the official trust list for a package. This could look like this:

PrivateRegistry has a new version of PkgX but isn't an officially trusted registry for PkgX. This could be an attack. Do you want to trust releases of versions from PrivateRegistry?
 [N] No: I do not trust PrivateRegistry to make unofficial releases
 [y] Yes: trust releases of PkgX from PrivateRegistry
 [a] All: trust releases of all packages from PrivateRegistry

This information could be saved in the PrivateRegistry.toml file in ~/.julia/registries as something like this:

git-tree-sha1 = "666dd7dc07e7949324d20591cde13de3b45ee1a8"
uuid = "20e4b06f-4c3f-4406-9bab-e758a9cb7e70"
path = "PrivateRegistry"
trust = true

Or for a specific list of packages for which the registry is trusted:

git-tree-sha1 = "666dd7dc07e7949324d20591cde13de3b45ee1a8"
uuid = "20e4b06f-4c3f-4406-9bab-e758a9cb7e70"
path = "PrivateRegistry"

[trust]
87703c6c-5a47-4a8b-8c61-6f07ed343807 = "PkgX"

The only other feature I can think of here would be allowing some registries to be trusted but only to provide new releases of packages from some other specific registries. But I'm not sure that's actually a useful feature: trust is pretty much all or nothing here. When you have trust = true in a registry's file, there's no reason not to trust it with everything. I'm not even entirely convinced that have a trust list for specific packages is useful. Why would you trust a registry to make unofficial releases of some packages but not others?

StefanKarpinski avatar Apr 25 '22 14:04 StefanKarpinski

Some care needs to be taken about the situation where different registries disagree about the set of trusted registries for a package is. For example, what happens when two already-trusted registries list different sets of trusted registries for a package? My gut says that we should take the intersection. But scenarios like the registry transfer one need to be thought through carefully to make sure they're possible. There's also the case where a trusted registry says another registry is trusted for a package, but the other registry doesn't yet have any versions of that package.

StefanKarpinski avatar Apr 25 '22 15:04 StefanKarpinski

It occurs to me that we need to keep a "trust database" somewhere anyway, and we could record the flag for "trust this registry to make unofficial releases" flag there as well instead of in the TOML file for the registry (which gets rewritten regularly).

StefanKarpinski avatar Apr 25 '22 15:04 StefanKarpinski

It feels like this discussion is starting to circle back to code signing, trust graphs (PGP/GPG?), CAs and all that entails. We should be careful not to reinvent a bad wheel here and maybe consult someone with professional expertise in establishing trust, signatures and so on. Especially your comment about a "trust database" makes me think of lots of prior art that already does that in various domains.

It may also be beneficial to tie this in with (possibly future) binary julia artifacts, to link them reproducibly to a given commit/version that artifact got built from.

Seelengrab avatar Apr 25 '22 15:04 Seelengrab

Why not just ask the user whenever a package is present in more than one registry, which one to use? That is, for each package detected to be present in more than two registries, Pkg will maintain locally a list of the registries it trusts for that package, which are manually approved by the user.

cossio avatar Apr 25 '22 15:04 cossio

@Seelengrab, it's not really—I didn't mention signatures or CAs at all 🙂. The "trust database" is literally just a place where you record which package UUIDs have been seen in which registries previously. There's no "trust graph" involved, it's strictly local information.

@cossio: Consider the package transfer scenario—how is the user supposed to know if this situation is safe or it's an attack? They're not actually in a position to know that.

StefanKarpinski avatar Apr 25 '22 16:04 StefanKarpinski

I know you didn't, it just feels like moving in that direction without actually taking the final leap of signing releases & managing which kinds of signatures are ok :) From what I can tell, the "trust database" approach has TOFU (Trust On First Use) problems just like connecting to an SSH server for the first time or receiving a PGP encrypted email from an unknown sender. Just that it's about "well should I trust this Registry with this new UUID I haven't seen from it before?" instead of public key encryption, which is how I got to "why not have admins install a CA whose signatures are allowed from this registry" and hence code signing.

I am aware that it's not backwards compatible, but versions that don't support this can't participate anyway, as mentioned in the OP about whether it's breaking or not. :thinking:

Seelengrab avatar Apr 25 '22 17:04 Seelengrab

Implementing our own PKI is a bad idea, we're definitely not doing that.

StefanKarpinski avatar Apr 25 '22 17:04 StefanKarpinski

To elaborate on that: if you need a PKI, you're basically always better off leveraging an existing PKI that's actively maintained—the bigger and more active, the better. Which means you should, instead of building your own, use HTTPS. Which, in our case, means downloading things over HTTPS and trusting that the content of what you downloaded is valid. Rubbing signatures and public key encryption on things is fun and all, but you still need to find out which signatures to trust from somewhere.

StefanKarpinski avatar Apr 25 '22 17:04 StefanKarpinski

So you could do a package transfer like this

How would this work with versions that were published before the transfer? Can I still install them after RegistryA got removed (I presume not)?

Another thought - what's preventing me as a package author to have some malicious code running during __init__ or during precompilation that changes the trust entry for some package, install my custom malicious registry and publish package versions that way? I think as soon as we have a trust entry there, we need to ensure the integrity of that information. I just don't see how we can guarantee that without some form of PKI - the most famous form of "local only verification" is in the form of videogame tamper proofing or mobile phones, which is extremely often broken by having the private key for decryption/signature checking stored next to the stuff that's encrypted/signed (storing the keys in hardware like a TPM is what e.g. Apple is doing for their integrity checking on iPhones, but even that is broken into every few months).

Seelengrab avatar Apr 25 '22 17:04 Seelengrab

How would this work with versions that were published before the transfer?

If they're transferred to the other repository, then you can install them. The same version info can be published in multiple registries, it's all just unioned together.

Another thought - what's preventing me as a package author to have some malicious code running during __init__ or during precompilation that changes the trust entry for some package, install my custom malicious registry and publish package versions that way?

If the attacker is already running arbitrary code on your system, why do they need to do any of the other stuff?

StefanKarpinski avatar Apr 25 '22 18:04 StefanKarpinski

But I'm not sure that's actually a useful feature: trust is pretty much all or nothing here. When you have trust = true in a registry's file, there's no reason not to trust it with everything. I'm not even entirely convinced that have a trust list for specific packages is useful. Why would you trust a registry to make unofficial releases of some packages but not others?

I'm not sure I'm following here. The scenario that is of primary interest to me is having General plus a company internal registry. I want to merge the registry information in exactly two cases:

  1. Private packages have been open sourced and new versions are registered in General.
  2. Private versions of packages from General are published in the internal registry.

In both cases I want the company registry to dictate whether it is allowed, on a package per package basis, to merge versions with General and not leave that to the individual users, who in a majority of cases won't have any idea. If any UUID appears in both the company registry and in General, which hasn't been explicitly whitelisted, I want Pkg to refuse merging and preferably be noisy enough that the situation is escalated within the company.

GunnarFarneback avatar Apr 25 '22 18:04 GunnarFarneback

You're describing a situation where you trust your company registry, so it would be a trusted registry.

If any UUID appears in both the company registry and in General, which hasn't been explicitly whitelisted, I want Pkg to refuse merging and preferably be noisy enough that the situation is escalated within the company.

Whitelisted where? How to do the whitelisting is exactly the question. Let's say there are two registries: Internal and General; Internal is a trusted registry, General is not. If a package appears in General first and then a version is published in Internal, that version will be trusted because Internal is a trusted registry. If a package appears in Internal first and then a version of it appears in General, the question is whether that is an attack or an intentional publication of a previously internal package. How does one distinguish the two situations? Some indication that it's ok for the General registry to publish versions of the package has to appear in the Internal registry. That's what I'm proposing: you indicate that it's ok by putting trust = "23338594-aafe-5451-b93e-139f81909106" (the General UUID) or something like that in the Package.toml file for the package in question—in the Internal registry.

StefanKarpinski avatar Apr 25 '22 19:04 StefanKarpinski

In essence, what I'm saying is that you need two things:

  1. A way to indicate that a registry (like Internal) is trusted and you can use any version it publishes.
  2. A way for a registry that you trust for some package to delegate its ability to other registries.

The first one is pretty simple: it's a registry-level boolean flag. Details of the trust delegation feature remain a bit fuzzy, but that's what we need work out.

Consider transitive delegation, for example. If Internal delegates the ability to publish new versions of a package to General and General delegates to Other, is that allowed? As a rule of thumb, conservatism suggests no, but on the other hand, if someone doesn't know about Internal, then the delegation from General to Other would be fine, so it's a little weird if knowing about Internal prevents General from delegating to Other when it would work for people who don't know about Internal. So that suggests that transitive delegation should work.

Another question is whether delegated trust is persistent or not. If Internal delegates to General and then removes that delegation, do we keep trusting it or stop? What if the Internal registry is deleted or the package in question is deleted from Internal, which would leave the package only in General. In the former case where a delegation is removed, it should not persist. In the latter case, where the package is deleted from Internal entirely (say you want to transfer it to General fully), then that trust should persist. You could, in that situation ask the user to muck around with deleting registries or clearing their trust database, but it seems better if it works automatically.

I'll work on writing up a proposal.

StefanKarpinski avatar Apr 25 '22 20:04 StefanKarpinski

I guess we somewhat agree, cf https://github.com/JuliaLang/Pkg.jl/issues/2393#issuecomment-777286926. The way I see it, General should say that it is fine with any merges (at the registry level) and the company registry should say on a registry level that merges are not allowed, unless respective package specifies that they may be be merged with specified registries. Pkg should require that all involved registries allow the merge.

I'm not really seeing the point or value of the temporal ordering. Possibly because I'm not understanding this delegation stuff in the first place.

GunnarFarneback avatar Apr 25 '22 20:04 GunnarFarneback