Packages downloaded from anaconda.org have unsupported filenames?
Checklist
- [X] I added a descriptive title
- [X] I searched open reports and couldn't find a duplicate
What happened?
I have a question that may be a bug in conda-package-handling. It might be intentional behavior.
I often download packages from anaconda.org to check their contents. I visited https://anaconda.org/conda-forge/nanoarrow/files and clicked a link to download the .conda package. This saved a file named linux-64_nanoarrow-0.4.0-py310h2372a71_0.conda. However, running cph x linux-64_nanoarrow-0.4.0-py310h2372a71_0.conda fails. It gives an error like:
LookupError: didn't find info-linux-64_nanoarrow-0.4.0-py310h2372a71_0 component in /mnt/c/Users/bdice/Downloads/linux-64_nanoarrow-0.4.0-py310h2372a71_0.conda
Full traceback:
$ cph x linux-64_nanoarrow-0.4.0-py310h2372a71_0.conda
Traceback (most recent call last):
File "/home/bdice/miniforge3/bin/cph", line 10, in <module>
sys.exit(main())
File "/home/bdice/miniforge3/lib/python3.10/site-packages/conda_package_handling/cli.py", line 121, in main
api.extract(args.archive_path, args.dest, prefix=args.prefix)
File "/home/bdice/miniforge3/lib/python3.10/site-packages/conda_package_handling/api.py", line 77, in extract
format.extract(fn, dest_dir, components=components)
File "/home/bdice/miniforge3/lib/python3.10/site-packages/conda_package_handling/conda_fmt.py", line 46, in extract
_extract(str(fn), str(dest_dir), components=components)
File "/home/bdice/miniforge3/lib/python3.10/site-packages/conda_package_handling/streaming.py", line 35, in _extract
stream = package_streaming.stream_conda_component(
File "/home/bdice/miniforge3/lib/python3.10/site-packages/conda_package_streaming/package_streaming.py", line 133, in stream_conda_component
raise LookupError(f"didn't find {component_name} component in {filename}")
LookupError: didn't find info-linux-64_nanoarrow-0.4.0-py310h2372a71_0 component in /mnt/c/Users/bdice/Downloads/linux-64_nanoarrow-0.4.0-py310h2372a71_0.conda
The problem is that the filename must be changed to nanoarrow-0.4.0-py310h2372a71_0.conda to match the component names, which are named like info-nanoarrow-0.4.0-py310h2372a71_0.tar.zst.
The line linked below is trying to find a component named the same way as the file.
https://github.com/conda/conda-package-handling/blob/b29610fb61647a980daf63baeed4756baced54f4/src/conda_package_handling/conda_fmt.py#L46
Is it reasonable to require that the filename matches the names of the components? I am a bit surprised that renaming the file would make it impossible to extract.
Conda Info
conda version : 23.11.0
Conda Config
channels:
- conda-forge
Conda list
conda-package-handling 2.2.0 pyh38be061_0 conda-forge
conda-package-streaming 0.9.0 pyhd8ed1ab_0 conda-forge
Additional Context
No response
I think the relevant code is here: https://github.com/conda/conda-package-streaming/blob/main/conda_package_streaming/package_streaming.py#L127
It certainly could be a more flexible scheme, but just matching prefix (info-) might have some unanticipated edge cases.
just matching prefix (
info-) might have some unanticipated edge cases.
Right, that's why I wasn't sure if this was intended behavior. However, it seems like it's quite a stringent requirement for the file to have a particular name in order to be extracted properly. It's certainly not obvious that the file name should have any effect on extracting it (no other compressed format or package format has such a requirement that I am aware of).
It has to do with how the files are downloaded. The headers have issues
Clicking the download link and letting the browser handle the download doesn't work. Copying the link from Anaconda and using another download tool (like curl or wget) does work
There's more context in issue: https://github.com/conda/infrastructure/issues/868
However, it seems like it's quite a stringent requirement for the file to have a particular name in order to be extracted properly. It's certainly not obvious that the file name should have any effect on extracting it (no other compressed format or package format has such a requirement that I am aware of).
I'd second the sentiment here. Coupling the ability to uncompress with the file name is unnecessarily fragile. Aside from getting the website to serve downloads without changing names, I'd really like to see the sensitivity to filename engineered out of the format.
We will fix anaconda.org and will close this ticket when it is deployed.
This should be fixed on anaconda.org. Let us know if it is working for you.
This appears to work now! Thank you @dholth.