uv icon indicating copy to clipboard operation
uv copied to clipboard

Add support for pinning a package to a specific index

Open charliermarsh opened this issue 2 years ago • 37 comments

Discussed this with Armin -- pip doesn't support it, and it seems like a big problem? If you have an internal index, but also want to get some packages from PyPI, there's no way to ensure that your internal packages come from your internal index. Packages on PyPI could even shadow them.

charliermarsh avatar Oct 23 '23 15:10 charliermarsh

https://github.com/pypa/pip/issues/8606

charliermarsh avatar Oct 26 '23 05:10 charliermarsh

https://github.com/python-poetry/poetry/issues/6713

charliermarsh avatar Oct 26 '23 05:10 charliermarsh

I don't quite understand why this is so hard (as per the pip issue), I can't tell if it's hard because it's a large conceptual change for pip specifically, if it's hard because it's pip is large and complicated and any changes are hard, or if there's inherent complexity.

charliermarsh avatar Oct 26 '23 05:10 charliermarsh

Poetry's design is interesting: https://python-poetry.org/docs/repositories/. It feels a bit more complex than is necessary though.

charliermarsh avatar Oct 26 '23 05:10 charliermarsh

Please consider dependency confusion attacks: https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610

Use of --extra-index-url as they are presently used are a security vulnerability.

PEP 708 is a yet-to-be-implemented approach to improving the security posture.

groodt avatar Feb 19 '24 02:02 groodt

I would love to implement improvements to this (like the ability to pin a dependency to a specific index)… We specifically held off and implemented indexes as-is to be spec-compliant. I’ll implement this as soon as it’s supported and there’s clarity on how installers should handle it!

charliermarsh avatar Feb 19 '24 04:02 charliermarsh

Makes sense.

I think you may receive a lot of duplicate feature requests from the folks who do misuse --extra-index-url and who aren't aware that it is not currently intended to be used to append additional sources of dependencies, it's purpose is to provide a set of fallback mirrors of the primary index (--index-url) which is PyPI in the general case.

We may need to consider offering some help as mentioned in the PEP to move this along.

This PEP has been provisionally accepted, with the following required conditions before the PEP is made Final:

An implementation of the PEP in PyPI (Warehouse) including any necessary UI elements to allow project owners to set the tracking data.
An implementation of the PEP in at least one repository other than PyPI, as you can’t really test merging indexes without at least two indexes.
An implementation of the PEP in pip, which supports the intended semantics and can be used to demonstrate that the expected security benefits are achieved. This implementation will need to be “off by default” initially, which means that users will have to opt in to testing it. Ideally, we should collect explicit positive reports from users (both project owners and project users) who have successfully tried out the new feature, rather than just assuming that “no news is good news”.

In the short-term, if you don't want to bug-for-bug implement pip, we may need to point people at alternatives like https://github.com/uranusjr/simpleindex to help them merge indexes behind the scenes on localhost. I don't think many will like it 😂

groodt avatar Feb 19 '24 04:02 groodt

@groodt can you point at official docs that confirm this use of extra index is indeed misuse? To me it fits well with local versions for example?

In any case, thanks for the links to simple index, this will be useful if case you're right and UV does not allow this use of extra index :) A bit sad to have to run two local servers instead of one, but that's not that bad.

pawamoy avatar Feb 19 '24 07:02 pawamoy

I don't quite understand why this is so hard

From what I recall (I quickly refreshed my memory on the issue but didn’t review the details) it’s a “design decision that was made a long time ago” type of complexity. Pip has built a bunch of choices on top of the “all indexes are equal and get merged” model, and it’s hard to know what we’d need to change if we were to revisit that decision. Add to that the need for backward compatibility and it’s a much bigger question than people saying “just pick the index I ask for” will accept.

For uv, I foresee the following issues:

  1. Compatibility - you’ll diverge from pip in both subtle and not-so-subtle ways. This isn’t a bad thing, but your stated intention is to be compatible with pip, so you need to make choices here.
  2. Being an outlier - the “all indexes are equal” model is not just something pip does. Standards like PEP 708 are built on the idea, so you may have trouble fitting that standard into your implementation. More generally, you’ll have to make a bunch of UI and design issues where there is less pre-existing experience to draw on.
  3. Scope creep - people start wanting things like index priorities (“get these from piwheels when you can, but fall back to PyPI if there’s nothing at all published on piwheels”) which have their own issues.
  4. You open yourself up to the idea that a package is no longer identified by just name and version - where it came from becomes important as well. This may have wider implications - for example, SBOM data might need extra information like source URL.

On the plus side, you’ll be offering a solution to something users have been wanting for a long time, and which is often characterised as a security issue. And practicality may well beat purity here.

can you point at official docs that confirm this use of extra index is indeed misuse

It doesn’t describe it as “misuse”, but the pip docs are clear that we treat all indexes as equal: “There is no ordering in the locations that are searched” from here. We (pip) could add a guide-type article discussing this in more depth, but we don’t have one currently.

pfmoore avatar Feb 19 '24 08:02 pfmoore

There’s a small security warning in the pip docs here

Using this option to search for packages which are not in the main repository (such as private packages) is unsafe, per a security vulnerability called [dependency confusion](https://azure.microsoft.com/en-us/resources/3-ways-to-mitigate-risk-using-private-package-feeds/): an attacker can claim the package on the public repository in a way that will ensure it gets chosen over the private package.

There is also an in progress pip PR to make this more explicit here https://github.com/pypa/pip/issues/11694

Here’s a major recent dependency confusion attack that impacted PyTorch (caused by instructions to use —extra-index-url) https://news.ycombinator.com/item?id=34202662

groodt avatar Feb 19 '24 09:02 groodt

@groodt I'm not sure this is to answer my previous question ("can you point at official docs that confirm this use of extra index is indeed misuse"), but I was especially referring to "it's purpose is to provide a set of fallback mirrors of the primary index".

I distinguish three use-cases for extra indexes:

  1. extend main index with additional projects (subject to dependency confusion)
  2. extend specific projects in main index with additional versions (not subject to dependency confusion)
  3. mirror PyPI.org

My understanding of your "append additional sources of dependencies" was that it referred to case 2, but now I think you were speaking of case 1.

So, to rephrase, and given pip currently considers all indexes to be equal, is case 2 is a misuse too?

(Anyway, after reading PEP 708, I also agree it's the way forward and not index ordering, as I commented here :+1: )

pawamoy avatar Feb 19 '24 09:02 pawamoy

Stepping back from the question of "why is this so hard", @groodt is correct that PEP 708 is the better solution here. Otherwise, does the user need to specify an explicit index for the whole of their internal package's dependency tree? What if someone adds a new internal module, and forgets to add it to the list of "must come from the internal index" list in all of their install jobs? Are we going to consider that as "user error"? The torchtriton attack took exactly this form. Or an attacker could compromise a public dependency of your internal project.

Having an index pin option doesn't prevent you from needing to handle the consequences of a "all distributions with the same name and version are interchangeable" model. It just gives users a manual way of firefighting issues with that model.

pfmoore avatar Feb 19 '24 09:02 pfmoore

So, to rephrase, and given pip currently considers all indexes to be equal, is case 2 is a misuse too?

It's not that "pip considers all indexes to be equal" but rather that "pip considers all distributions with the same name and version to be interchangeable regardless of which index they came from". The difference is subtle but important. Whether case 2 is a problem depends on whether you trust the "main index", just the same as with case 1. The trust issue is what's important here rather than what is "considered equal"[^1].

The --extra-index-url argument for pip was added long ago, in simpler times when people weren't anywhere near as worried about attackers targetting PyPI, and nobody was running multinational businesses based on Python code. Trust was generally assumed to be present, and in particular no-one was worrying about relative trust between indexes. Things have changed, for better or worse, and --extra-index-url doesn't match the new reality. This is why one of the options pip is considering is simply removing it, and demanding that users pick a single index and ensure that they trust that one index. That's unlikely to ever happen because the breakage would be significant, but it's absolutely an option that a new project like uv could (and should) consider.

To answer your question more explicitly, your case (2) is a "misuse" in the sense that it has risks (the same as case 1). The risks may not be a dependency confusion attack, but they do include compromise of the PyPI account owning of the code you're extending. Is that a more acceptable risk? Only you can decide that.

The point is that mixing indexes with different trust levels[^2] is the problem here. Even the "mirror PyPI" case involves a risk if the mirror is compromised.

[^1]: You could argue that if you don't trust two indexes equally, you don't "consider them equal", I guess... [^2]: Within the same install command, which is why pinning for a set of named packages isn't a sufficient solution.

pfmoore avatar Feb 19 '24 09:02 pfmoore

Thanks @pfmoore!

(Side note: I just discovered that PDM supports respecting the order of indexes: https://pdm-project.org/latest/usage/config/#respect-the-order-of-the-sources.)

pawamoy avatar Feb 19 '24 11:02 pawamoy

Just as an example (and yes, it's a pathological case) if you have ordering, suppose you have index1 and index2 (ordered with 1 having priority over 2). Index 1 contains A 1.0 and B 1.0, with A 1.0 depending on B. Index 2 has B 2.0. If you install A, do you get B 1.0 or B 2.0? If the answer is 1.0, why did you bother specifying index 2? If you get B 2.0, and someone now adds A 2.0 to index 2, and changes B 2.0 to depend on A > 1.0, is the correct thing to upgrade A to 2.0, or downgrade B to 1.0?

The point here is that there's multiple possible choices, and if you don't factor index priority into the core of your resolution algorithm, you end up with a system that users won't have a good intuition about, and which might depend on implementation details. Both of which can lead to security issues. I'm not saying that PDM has such a problem - they may well have considered all of this. Just that "which index did this come from" is an extra axis you have to consider as part of resolution, not just something you can keep separate from the resolver.

Anyway, the key point for uv, and in answer to the question @charliermarsh asked above, is that this is why pip is still trying to work out a good answer to how we could allow users to choose which index to use at a more fine grained level than "per install".

pfmoore avatar Feb 19 '24 11:02 pfmoore

If the answer is 1.0, why did you bother specifying index 2?

To be able to find package C, D, E, etc. :smile: In my case, all packages from index 1 take precedence, even if higher versions are available in index 2. Index 1 contains just a tiny few packages, index 2 contains all the rest (PyPI.org in my case). Well, pypiserver can redirect to a fall back index when a package isn't found, so a single index (the local pypiserver one) is enough for me, but if it didn't have this feature, I'd have to continue relying on index-url + extra-index-url, with the limitation that I cannot enforce packages to be fetched from one of the two.

PEP 708 will bring the same ability with finer-grain control (per-project fallback). I'll see if pypiserver maintainers are interested in supporting it :slightly_smiling_face:

Could also be interesting to know how PDM handles ordering. If @frostming wants to chime in :smile:

pawamoy avatar Feb 19 '24 13:02 pawamoy

In my case, all packages from index 1 take precedence, even if higher versions are available in index 2.

Cool, so your approach to priority is like pinning, but with the decision on whether to pin being "if package A is in index 1, then pin to index 1, else fall through". That works, but it doesn't support the piwheels case where they supplement PyPI with wheels for the raspberry pi architecture, letting installers fall back to PyPI if the wheel isn't valid for the user's architecture (at least that's how I understand what they do...)

Getting into this much detail may be more than the uv maintainers want, though. Let's see if this is useful to them. I don't want to come across as saying that this is too hard to do - it's too hard for pip to easily do (which is what triggered my original comment) but uv may well have different priorities and choose different trade-offs. They also don't have backward compatibility to deal with (unless they want to match pip feature for feature).

pfmoore avatar Feb 19 '24 13:02 pfmoore

@pfmoore Thank you very much for your detailed comments here. They've been super helpful.

I'd like to make a small tweak to how uv behaves today that will hopefully resolve at least some of the issues users are hitting (#1377, #1600, #1451) in practice, but do it in a way that doesn't make uv behave like pip (which, AIUI, is to consider all available packages from all indexes, without any guaranteed priority order).

So today, our implementation works by giving a preference order to the indexes made available to uv:

https://github.com/astral-sh/uv/blob/995fba8fec6ac6569e5f7ffe8d244faddef98692/crates/uv-client/src/registry_client.rs#L191-L210

That is, given uv pip install --index-url foo --extra-index-url bar --extra-index-url quux package, it will first try foo, then bar and then quux. Once a package is found, uv will stop looking for it in any other indexes. So for example, if package is found in bar, then that means it definitely wasn't found in foo, and it may or may not be in quux. We never check.

Since --index-url defaults to PyPI, that means something like uv pip install --extra-index-url bar package will check PyPI first for package, and if it's found, stop. I think this turns out to be the reverse of what the commonly desired behavior is. And while it won't address every use case, I think a nice stop-gap solution here would be to flip the preference order with respect to --index-url and --extra-index-url. That is, we check PyPI after all other extra index URLs given.

This will not match pip's behavior in every case, but I think it does help address some of the common cases and I think also helps to mitigate the dependency confusion concerns. That is, if package is in bar, then uv would completely ignore any packages named bar on PyPI. (Today, that's flipped. If package is on PyPI, then it will completely ignore any package on bar.)

Otherwise, I do agree that if we can get away with it, we should probably avoid encoding pip's behavior of including packages from all indexes without regard to priority into uv. But we can certainly revisit this if my tweak above doesn't pan out.

And popping up a level, I do think we'll want to absolutely address the multi-registry issue by giving users more control when we build out more project management features. But I think until then, we'll probably want to avoid adding too many additional abstractions into uv pip install for dealing with multiple registries. And of course, I suspect we will want PEP 708 support eventually too.

BurntSushi avatar Feb 29 '24 12:02 BurntSushi

Nice! Worth noting is users who want the flipped behavior can do uv pip install --index-url https://private-index.com/simple --extra-index-url https://pypi.org/simple, right?

pawamoy avatar Feb 29 '24 13:02 pawamoy

Nice! Worth noting is users who want the flipped behavior can do uv pip install --index-url https://private-index.com/simple --extra-index-url https://pypi.org/simple, right?

Ah yes! I meant to call that out, but yes indeed.

BurntSushi avatar Feb 29 '24 13:02 BurntSushi

While setting priority is good, in many cases it's only a bandaid while a proper solution would be enforcing association between specific packages and registries. This is not something that can be fixed server side; this has to be on the client, without any fallbacks or flexible behaviours. So if a package is not found in the registry, install (or any other operation) has to fail. This is to prevent supply chain attacks -- any other behaviour creates scenarios where due to intermittent problems (e. g. network) a malicous package can be installed.

vlad-ivanov-name avatar Mar 01 '24 16:03 vlad-ivanov-name

... while a proper solution would be enforcing association between specific packages and registries.

I think this is roughly what I meant above with:

And popping up a level, I do think we'll want to absolutely address the multi-registry issue by giving users more control when we build out more project management features.

BurntSushi avatar Mar 01 '24 16:03 BurntSushi

Thank you for the reply. Do you think this feature could be implemented in API sooner than implementing the UX for it in the command line? pixi recently switched to uv as a Python backend, and as they don't implement pip UX/UI, just an API that would be more detailed than the current "flat" package source from multiple URLs would already enable it there.

vlad-ivanov-name avatar Mar 01 '24 16:03 vlad-ivanov-name

I have a similar problem. CASE A:

  • Lets say you have a PKG with 1 yanked version available in UV_INDEX_URL or pypi.org
  • You have the same package in your UV_EXTRA_INDEX_URL, but with valid versions.
  1. when PKG is a dependency of something that we try to install, it works with latest uv version. :ok:
  2. trying to install the PKG directly, gives an error: :no_entry:
error: Missing `Content-Type` header for <EXTRA_INDEX_URL>/PKG

CASE B:

  • When you have multiple indexes in UV_EXTRA_INDEX_URL and the PKG is in the first extra:
  1. same behaviour as with just 1 extra :no_entry:

CASE C:

  • When the PKG is not in the first extra, but in subsequent extra indexes:
  1. installation is not possible, it only looks for the package in the UV_INDEX_URL :no_entry:
× No solution found when resolving dependencies:
  ╰─▶ Because only PKG==0.0.0a1 is available and PKG==0.0.0a1 is unusable because it was yanked (reason: not stable), we can conclude that all versions of PKG cannot be used.
      And because you require PKG, we can conclude that the requirements are unsatisfiable.

      hint: PKG was requested with a pre-release marker (e.g., any of:
          PKG<0.0.0a1
          PKG>0.0.0a1
      ), but pre-releases weren't enabled (try: `--prerelease=allow`)

atti92 avatar Mar 05 '24 19:03 atti92

What's the case in which you're seeing error: Missing Content-Type header for <EXTRA_INDEX_URL>/PKG? That seems like a bug in the registry, but perhaps we can handle it gracefully.

charliermarsh avatar Mar 05 '24 19:03 charliermarsh

What's the case in which you're seeing error: Missing Content-Type header for <EXTRA_INDEX_URL>/PKG? That seems like a bug in the registry, but perhaps we can handle it gracefully.

  • UV_INDEX_URL is undefined.
  • UV_EXTRA_INDEX_URL points to a NEXUS instance with pypi repo

It only happens if I try to install a package that is available in both pypi.org and in the nexus. Installing packages that are only in the nexus or pypi works. Also it only happens when installing the package directly with uv pip install PKG

atti92 avatar Mar 05 '24 19:03 atti92

I think you want https://github.com/astral-sh/uv/issues/1754.

charliermarsh avatar Mar 05 '24 19:03 charliermarsh

Yes I can confirm using --no-cache removes the error message.

Do you know anything about using multiple extra-index-urls, is that supposed to work? it only resolved the package if it was in the first one, tried --extra-index-url and env too.

we have multiple gitlab registries, on top of the nexus and pypi, and we need to install many packages from all the different repositories.

atti92 avatar Mar 05 '24 19:03 atti92

Do you know anything about using multiple extra-index-urls, is that supposed to work? it only resolved the package if it was in the first one, tried --extra-index-url and env too.

See https://github.com/astral-sh/uv/pull/2083

BurntSushi avatar Mar 05 '24 19:03 BurntSushi

Do you know anything about using multiple extra-index-urls, is that supposed to work? it only resolved the package if it was in the first one, tried --extra-index-url and env too.

See #2083

My package is only available in one of the extra-index-urls outside of index-url, but it still cannot find it, if it's not the first.

atti92 avatar Mar 05 '24 20:03 atti92