pixi icon indicating copy to clipboard operation
pixi copied to clipboard

Add ability to require all pypi installed packages be explicitly listed

Open tacaswell opened this issue 11 months ago • 6 comments

Problem description

While layering packages from pypi on top of a conda env can be very helpful (for pure-python things that are not yet packaged for conda-forge) or required (for local installs), but it also is a huge foot cannon due where the wheels may bring in incompatible binaries, pull things that do exist on CF from pypi unnecessarily, has a chance of needing to build from an sdist, or mess with already installed packages. To that end, a setting that would cause

pixi add --pypi foo

to fail if any package other than foo would be added to the lock file would go a long way to preventing users from getting into bad situations. Maybe this should be phrased as "if any pypi packages not explicitly listed in the config is in the lock file fail the solve" (I'm still working on getting my head fully around pixi's view of the world).

Ideally I would like an error message that says something like

You asked to install `foo`, this depends on `bob`, `baz`, and `bar`.  

Please explicitly add these as dependencies or disable the safety feature you tripped.  

I would also ideally like this to be the default, but I suspect that would be a pretty disruptive change. I have an issue open on pip to add a similar flag there ( https://github.com/pypa/pip/issues/12958 ) and have locally been using a version of pip with a patch to implement this (https://github.com/tacaswell/pip/tree/hacky_hack) and it is super useful (in a non-pixi context) to make sure things do not get accidentally reinstalled/downgraded or unexpected dependencies do not pop in.

This is related to https://github.com/prefix-dev/pixi/issues/1417 is related in that it would let users avoid surprise installations from pypi, but would delay discovering a missing dependencies until the user tries to use it.

Related to https://github.com/prefix-dev/pixi/issues/2238 in that it relates to how packages available is both places are handled, but with a different valence of how the conflict gets resolved.

This will likely also require https://github.com/prefix-dev/pixi/issues/2764 to be implemented as I know of at least one package (siphash24) that does not have CF packages for all of the platforms.

tacaswell avatar Jan 22 '25 21:01 tacaswell

Hey @tacaswell,

I feel like you would really fit the team in therms of thinking about a perfect world 😉.

The change you propose and how I understand it would be pretty breaking for the expected workflow for most users. We often hear that people don't want to care about any of this and just want their packages installed, if they're used to pip they're also used to dealing with this, unfortunately.

As you explain, the PyPI ecosystem has little protections in-place to make sure the dependencies work together nicely. This is why we often advise users should get as much from conda-forge as they can get.

After that, pixi kind of has to obey the requirements given by the user, as it would be extremely hard to enforce users to take care of managing every single requirement they need and this would also go against the vision of solving the pain of managing packages. Pixi is currently trying to manage the split between being opinionated and strict to protect users from making mistakes and still allowing for flexibility to overcome issues introduces by wrongly configured package metadata.

Pixi solves the problem of mismatching PyPI and Conda package by figuring out what was already installed as a Conda package and then not installing the PyPI variant of it. This is broken for every other packaging tool, as the names of the packages might not align. e.g. On PyPI: torch on Conda-forge: pytorch.

This feature would be a big step in the right direction from our point of view. But it's pretty complex and got push forward many times already...

Next to building environment management in pixi we're also building rattler-build and pixi build which are going to make it a lot more comfortable to release packages on conda-forge. This is our way of making sure there are high quality packages available for pixi to install, without getting into the issues you describe.

In the end pixi is not a PyPI tool but a General Purpose package manager, the PyPI integration is required as it's the biggest problem maker for conda environments. For example it works perfect with cargo like we do in pixi itself, and same goes for C/C++, R etc.

Hope this explains it a little more.

ruben-arts avatar Jan 23 '25 09:01 ruben-arts

sorry, I missed tho notification from this last week!

as it would be extremely hard to enforce users to take care of managing every single requirement they need and this would also go against the vision of solving the pain of managing packages.

I would argue that if you are layering pypi on top of CF you have already signed up for a for bunch of pain in managing your packages. Currently I'm solving this by grepping the lock file after adding a pypi dep and adding conda dependencies until only what I expect to come from pypi is coming from it. By having the tool do this check on every generation of the lock file means if dependencies change it still gets caught and the barrier for more people to opt into accepting a bit of up-front pain to avoid pain down the road goes down.

I agree on-by-default is probably disruptive and for many users the YOLO approach may still be reasonable, but for cases where people are using pixi to define environments for deploying software accepting the extra complexity for the assurance the packages are coming from where you think is very valuable.

tacaswell avatar Jan 29 '25 17:01 tacaswell

Let's see if this issue gets traction, I feel like this is not just a small feature addition, so if this is important to more people we could maybe create a good design for it.

ruben-arts avatar Jan 30 '25 10:01 ruben-arts

I'm re-reading this now, still unsure how to go about this. The only thing that I can think of that would be the least disruptive would be a warning you need to explicitly turn on. Or a global/manifest setting that disallows transient PyPI installations. Another option would be the much-discussed: https://github.com/prefix-dev/pixi/issues/1417 (no-deps) option that would allow opting out of the dependencies altogether.

tdejager avatar Oct 10 '25 06:10 tdejager

I think an opt-in error is better, but an opt-in warning would be enough.

--no-deps may also have a use, but does not solve the problem I want to solve (it does get the "only what I asked for" but not "definitely has all the dependencies").

tacaswell avatar Oct 10 '25 12:10 tacaswell

Yeah opt-in error would be fine as well :)

tdejager avatar Oct 10 '25 13:10 tdejager