pixi icon indicating copy to clipboard operation
pixi copied to clipboard

Improve pypi-mapping fetch logic

Open ruben-arts opened this issue 9 months ago • 17 comments

The hash-mapping is fetched from a non standard https://conda-mapping.prefix.dev location.

This should be configurable, and if it's not usable the work around should be clear. That is either redirecting to a local compressed mapping file or writing the map in the manifest file.

ruben-arts avatar Feb 24 '25 13:02 ruben-arts

That's the gist of my ask here: [^1]

  • https://github.com/prefix-dev/pixi/issues/3179#issuecomment-2680135776

Happy to close that in favour of tracking the request here.


[^1]: TBF the title of the issue was terrible, so I just updated it.

dhirschfeld avatar Jul 15 '25 09:07 dhirschfeld

Curious about the status of this and #3179. It would be very difficult to get our IT whitelist conda-mapping.prefix.dev, so any upstream workaround would be appreciated. I'd also suggest making this behavior more transparent in the docs. Currently it's pretty buried in https://pixi.sh/latest/reference/pixi_manifest/#conda-pypi-map-optional. Thanks!

y1zhou avatar Aug 18 '25 03:08 y1zhou

Hey @y1zhou you want to opt-out of the mapping, or host it somewhere else yourself :)?

tdejager avatar Aug 22 '25 12:08 tdejager

Hey @y1zhou you want to opt-out of the mapping, or host it somewhere else yourself :)?

Hi @tdejager, if opting out is a possibility (I'm unsure how this mapping file is used exactly) I'd be more than happy to test it out! Self-hosting is also fine. Another solution would be hosting the mapping file somewhere that's likely to be already whitelisted? For example on conda-forge, PyPI, GitHub, etc.

y1zhou avatar Aug 22 '25 12:08 y1zhou

Yeah so the problem its not a single file but a collection of them :) The fallback can be a single file so we could think of just pulling that from somewhere else, however it does not work as well. The same problem of opting-out, the mapping is very useful.

The most useful thing would be to transform the parselmouth code: https://github.com/prefix-dev/parselmouth into rust so you could map "on-the-fly" if you opt out of the mapping, this will make the intial solve slower but would provide the most accurate mapping. This however is the most work.

You could look into how viable using parselmouth yourself is :) Then we can figure out if changing the location makes sense at all.

tdejager avatar Aug 22 '25 14:08 tdejager

Also now that I'm rereading your comment maybe we should also document how the mapping is used in the first place! This might also help convince corporate IT of its usefulness, although I may be to optimistic there 😄

tdejager avatar Aug 22 '25 17:08 tdejager

Also now that I'm rereading your comment maybe we should also document how the mapping is used in the first place! This might also help convince corporate IT of its usefulness, although I may be to optimistic there 😄

That would be useful! I'm sure I could host it myself pretty easily if I could just figure out what exactly I need to do (plus change the source to make the location configurable) https://github.com/prefix-dev/pixi/blob/fbd8ab6232573831fa477aa538dae6429471d397/crates/pypi_mapping/src/prefix/hash_mapping_client.rs#L17

dhirschfeld avatar Aug 22 '25 21:08 dhirschfeld

Yeah the thing is it's not easy to host as it's now directly connected to cloudflare I think, we could change that though. But more importantly it needs to run at a fixed interval to 'keep m mapping' so to say 😁

tdejager avatar Aug 23 '25 07:08 tdejager

I really just need to know what files to host, what endpoints to expose and where I can source the files from (download or generate from code). I can handle the containerisation, helm chart and (nginx (or fastapi) webserver if I need.

dhirschfeld avatar Aug 23 '25 07:08 dhirschfeld

I really just need to know what files to host, what endpoints to expose and where I can source the files from (download or generate from code). I can handle the containerisation, helm chart and (nginx (or fastapi) webserver if I need.

I'll be making the case to security to allow that domain, as I'd rather not do the work, but security being security... ¯\(ツ)

dhirschfeld avatar Aug 23 '25 07:08 dhirschfeld

Hi! First of all, great work on pixi — this is a really amazing tool!

Just to clarify my understanding (and maybe help structure the discussion), I think there are several points to address in this issue:

  • Better documentation on:
    • what conda-pypi-map is
    • how it can be modified
    • the consequences of opting out
    • where to find conda-pypi-map files in the internet (e.g. in an air-gapped cluster scenario)
    • how to set up an Artifactory/Nexus to serve these files
  • Making conda-pypi-map a globally configurable option
  • Something I didn’t fully understand regarding the idea of rewriting prefix-dev/parselmouth in Rust

Is there currently any PR in progress that tackles one of these points? So that I may contribute, or at least follow

EDIT: This issue seem related: https://github.com/prefix-dev/pixi/issues/1795

xroynard avatar Aug 26 '25 11:08 xroynard

I'm sure I could host it myself pretty easily if I could just figure out what exactly I need to do (plus change the source to make the location configurable)

It seems it is configurable at least:

  • https://pixi.sh/latest/reference/pixi_manifest/#conda-pypi-map-optional

dhirschfeld avatar Aug 26 '25 11:08 dhirschfeld

I think that does tie you into the simplified mapping right @nichmor?

tdejager avatar Aug 26 '25 17:08 tdejager

I had run into this yesterday in a setup with a partial conda-forge mirror in Artifactory. For some newer releases of packages, the sha256 hash wasn't available from https://conda-mapping.prefix.dev/ yet.

An example would be https://conda-mapping.prefix.dev/hash-v0/8dc54e94721e9ab545d7234aa5192b74102263d3e704e6d0c8aa7008f2da2a7b for requests-2.32.5-pyhd8ed1ab_0.conda.

I think in this situation normally it would fall back to the compressed mapping but didn't because of the custom Artifactory channel. I mentioned this in the discord but I'd like to add an option to the machine config for setting conda-pypi-map. This would let corporate deployments of pixi to configure the mappings for all users instead of each repo/user having to set it themselves.

EDIT: At the time I tried it I was getting a not found error from cloudflare for the above link but it seems to have populated now

tl-hbk avatar Aug 26 '25 17:08 tl-hbk

Just fyi. We try to avoid setting things in the global config, to reduce reproducibility issues, as the global config. We do allow the global config to influence 'pixi in it, a lot though, like changing the default channels.

The main exception, I believe but my info could be outdated, to this rule being: authentication as it basically does not belong in the local manifests. I think you are suggesting this would also be an exception?

tdejager avatar Aug 26 '25 17:08 tdejager

I think it makes sense to allow the mapping be configurable in the global config since it seems like it impacts solving with other channels. If I'm setting my default channels to some private channels then I'd like the mappings available for them too.

EDIT: If not through conda-pypi-mapping it would be nice to be able to configure an optional mapping url/file for each channel through default-channels. Maybe something like

default-channels = [
    { url = "", pypi-mapping="" }
]

tl-hbk avatar Aug 26 '25 17:08 tl-hbk

If I set conda-pypi-map = { "conda-forge" = "map.json"}, is this OK to just add new items ? Or do I need to insert the current json first ? And where can I download the current json ?

huxs001 avatar Oct 24 '25 04:10 huxs001