auditwheel icon indicating copy to clipboard operation
auditwheel copied to clipboard

Add --exclude option to auditwheel repair

Open martinRenou opened this issue 3 years ago • 13 comments

This is a follow-up of #310 by @rossant, only keeping the exclude option as suggested by @mayeut

Closes #76 Closes #241 Closes #310 Fixes #391

Original PR description:


This is a quick fix to solve an issue I have with my project and for which auditwheel seemed to help. I don't know if this approach is suitable to other users of auditwheel.

I develop a C graphics library (datoviz) that depends on Vulkan and comes with Cython bindings. https://github.com/datoviz/datoviz/issues/13 for Linux (at least Ubuntu 20.04, that would be a start).

  1. As a first step, I compile the C project and get a libdatoviz.so shared library.
  2. Then, I build the Cython library, which has a dynamic dependency to libdatoviz.
  3. I'd like to bundle both the Cython extension module, and libdatoviz, in the same wheel.
  4. I tried to use auditwheel repair. It included dozens of other dependencies in the wheel, including libvulkan and many graphics-related dependent libraries. The compiled wheel installs properly, but it doesn't work properly (there are issues related to windowing, GPU access, etc). Related: https://github.com/pypa/auditwheel/issues/241
  5. I made some changes to auditwheel (this pull request) to exclude libvulkan and a few other libraries. Some issues were fixed, but not others.
  6. As another approach, I tried to bundle just libdatoviz, and no other library in the wheel. That seems to work, at least on my computer. Chances are that the wheel may not work on other Linux-based operating systems though. Any help would be appreciated, regarding either this pull request, and/or a way to solve the issue I'm facing. Thanks!

martinRenou avatar Feb 05 '22 15:02 martinRenou

Codecov Report

Base: 92.42% // Head: 92.45% // Increases project coverage by +0.02% :tada:

Coverage data is based on head (83ae412) compared to base (73c9f17). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #368      +/-   ##
==========================================
+ Coverage   92.42%   92.45%   +0.02%     
==========================================
  Files          23       23              
  Lines        1268     1272       +4     
  Branches      311      312       +1     
==========================================
+ Hits         1172     1176       +4     
  Misses         55       55              
  Partials       41       41              
Impacted Files Coverage Δ
src/auditwheel/main_repair.py 90.76% <100.00%> (+0.14%) :arrow_up:
src/auditwheel/repair.py 86.61% <100.00%> (+0.32%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Feb 05 '22 15:02 codecov[bot]

Thanks for reviewing! I'll try to find the time to add tests and fix linting

martinRenou avatar Mar 04 '22 14:03 martinRenou

I think this is a nice addition. Any chance we can get this in sooner? 🙂

leofang avatar Mar 25 '22 01:03 leofang

@martinRenou I made a few changes to the implementation, mainly to simplify it and added tests for it. I hope you don't mind that I pushed the changes directly into your branch.

lkollar avatar Jul 18 '22 14:07 lkollar

I don't mind at all. Thanks a lot!

martinRenou avatar Jul 25 '22 09:07 martinRenou

I haven't merged this yet, as I'm still unsure if it is a good idea to exclude arbitrary libraries. On one hand, it solves a real issue, and I don't see how else we could achieve this, but it also ends up violating the manylinux spec. There is a risk that this will cause confusion and bad user experience, and it makes it more likely that users will run into ABI incompatibilities due to version mismatches between the extension and its excluded dependencies. Another concern is that if PyPI wanted to block uploads of non-conforming wheels, this feature would make it more difficult to implement it (see https://github.com/pypi/warehouse/issues/5420).

I wonder if this needs a larger discussion with the PyPA community, since the scope is bigger here than just auditwheel. If we want to properly support wheels which require some special external libraries, that could require changes in tooling in other areas as well. I don't have the bandwidth to take this on, but if someone wanted to raise this on the Packaging Discourse, that would be a good start.

Any thoughts @mayeut?

lkollar avatar Aug 16 '22 20:08 lkollar

I think this would allow abusing the purpose of auditwheel. I commented on the original issue about not pulling libOpenGL shared objects into the wheels.

mattip avatar Aug 17 '22 06:08 mattip

I wonder if this needs a larger discussion with the PyPA community, since the scope is bigger here than just auditwheel. If we want to properly support wheels which require some special external libraries, that could require changes in tooling in other areas as well. I don't have the bandwidth to take this on, but if someone wanted to raise this on the Packaging Discourse, that would be a good start.

There was PEP 668, which AIUI was also trying to help this use case of having some libraries externally managed. This is the result of a few discussions in different threads over the years (including past PyCons):

  • https://discuss.python.org/t/drawing-a-line-to-the-scope-of-python-packaging/883
  • https://discuss.python.org/t/playing-nice-with-external-package-managers/1968
  • https://discuss.python.org/t/pep-668-marking-python-base-environments-as-externally-managed/10302

jakirkham avatar Aug 17 '22 22:08 jakirkham

I commented on the original issue about not pulling libOpenGL shared objects into the wheels.

This would require a new ABI compatibility analysis to be done to ensure whitelisting is allowed per pep-600

I think this would allow abusing the purpose of auditwheel.

Yes, it would allow abusing the purpose but allowing package maintainers to use this (responsibly) is a way not to have auditwheel maintainers on the critical path of many projects. As can be seen by the activity in the repo, I do not have that much time to spend on this project, I'm not sure about @lkollar, but I guess that's about the same.

see also my other comment in https://github.com/pypa/auditwheel/pull/310#issuecomment-849773358

There are already many packages on PyPI abusing PEP600, this is no reason to do so but, IMHO, having an escape hatch is always a good thing to have. It's probably better to have this escape hatch and have the rest of auditwheel run its process than to have a bunch of hand-tagged manylinux1 wheels on PyPI.

One example of abuse (or let's say at least gray area) are GPU dependent packages. Those raised some discussion but I'm not sure there's a real conclusion yet (& the discussion starter was package size rather than pep600 compliance). For example, one of torch wheels is torch-1.12.1-cp310-cp310-manylinux1_x86_64.whl. At least one of the shared object in that wheel links against libcuda.so.1 which is part of the GPU driver and should not be grafted (nor whitelisted per strict PEP600) thus can't be repaired by auditwheel today (& "this led" to an invalid manylinux1 tag - might be another reason entirely but sure would have been a blocker).

There is a risk that this will cause confusion and bad user experience, and it makes it more likely that users will run into ABI incompatibilities due to version mismatches between the extension and its excluded dependencies.

Agreed but my comment above applies: I think the pros outweigh the cons.

Another concern is that if PyPI wanted to block uploads of non-conforming wheels, this feature would make it more difficult to implement it (see https://github.com/pypi/warehouse/issues/5420).

I think that if PyPI is to implement that feature, they should not block on missing shared objects. They can't know for sure they're not compliant unless doing a manual review. We can't even know for sure that a manylinux1 tagged wheel that embeds some manylinux2014+ only symbol versions is not in fact manylinux1 compliant although in 99.9% of cases, we'd be right to assume it should be tagged manylinux2014.

I wonder if this needs a larger discussion with the PyPA community, since the scope is bigger here than just auditwheel. If we want to properly support wheels which require some special external libraries, that could require changes in tooling in other areas as well. I don't have the bandwidth to take this on, but if someone wanted to raise this on the Packaging Discourse, that would be a good start.

While I personally think this should be merged, ~~some discussions could be raised but I don't have the bandwidth to take this on either.~~

EDIT: @rgommers is probably right about a large community-wide discussion 2 comments below this one.

mayeut avatar Oct 08 '22 15:10 mayeut

One example of abuse (or let's say at least gray area) are GPU dependent packages. Those raised some discussion but I'm not sure there's a real conclusion yet (& the discussion starter was package size rather than pep600 compliance).

Speaking from my personal experience, the whitelisting discussion here is intimately related to the package size discussion. They are the different facets of the same problem.

For GPU packages (or any packages that rely on vender-provided, proprietary prebuilt binaries) without whitelisting they only have the following options:

  1. dlopen
  2. static linking or auditwheel repair

Option 1 is technically still in violation of the auditwheel spirit, because it's still dynamic linking and could allow a non-self-contained wheel. It also requires a careful runtime check to ensure the correct shared library is dlopen'd.

Option 2, on the other hand, leads to significantly increased package sizes and, in particular for proprietary packages, potential violation of the vendor's EULA (some vendors put very strict redistribution limitation).

Finally, to adopt either option requires the downstream package providers to invest significant effort.

I am +1 for supporting whitelisting or any kind of escape hatch. The only (minor) disagreement I have with @mayeut's above comment is I see no cons, only pros 🙂

leofang avatar Oct 10 '22 18:10 leofang

This has now been proposed independently multiple times:

  • gh-76, for packages depending on PyArrow
  • gh-241 (issue) and gh-310 (PR) for graphics libraries (libOpenGL, Vulcan)
  • gh-391 for packages depending on PyTorch’s C++ API
  • In this PR, GPU packages have come up. @leofang is weighing in as a maintainer of CuPy and cuQuantum (and some more projects perhaps)
  • @mattip and I want this as a solution for a problem in NumPy and SciPy, where we now have to vendor OpenBLAS twice, and having two OpenBLAS copies included can lead to hard-to-debug performance issues
  • We will have the same needs for / issue with OpenMP, now vendored by scikit-learn, as soon as another core PyData project decides to vendor it too (you must have a single OpenMP runtime in a stack, not two)

So that's a good fraction of the most popular scientific computing, data science and ML/AI libraries - plus some visualization mixed in for good measure.

There are already many packages on PyPI abusing PEP600, this is no reason to do so but, IMHO, having an escape hatch is always a good thing to have.

I would agree with the latter part. For the former I'd say: this is not abuse, there are good technical reasons for wanting to exclude a shared library. At least four separate ones have been given:

  • for libOpenGL it was related to drivers, the shared library must come from the system rather than from a wheel to work at all
  • for PyArrow and GPU packages it's package size. Unnecessary copying of .so's that are hunderds of MB (sometimes even ~1 GB as for libtorch) is very bad. Those .so's are already present in other wheels, and can be reused.
  • for OpenBLAS and OpenMP it's needing to have a single copy of the library, rather than two or more (and with multiple name-mangled copied you do end up with multiple copies) to avoid performance or functionality issues
  • vendors with a EULA that prohibits redistribution (not just CUDA, I think MKL is also in that category)

I am all for discouraging stuffing native dependencies in wheels when it is not needed, because it doesn't fit well with the design of PyPI and Python packaging. There's lots of ways to shoot ourselves in the foot, and the author-led model of uploading new releases of versions one by one, built without a common infrastructure, will make it hard to avoid problems or correct them later.

That said, it is very clear that there is a need. So there should be an escape hatch. The users of that escape hatch are responsible for ensuring that they do the preloading correctly in order to not end up with end users with broken systems. But in the end it's also not that hard to get it right, once the correct pattern is established (which is checking the location of the Python package providing the shared library first, then preloading it - and not simply using RPATH, as discussed in several of the other issues).

There is a risk that this will cause confusion and bad user experience, and it makes it more likely that users will run into ABI incompatibilities due to version mismatches between the extension and its excluded dependencies.

Agreed but my comment above applies: I think the pros outweigh the cons.

I agree with @mayeut here. This is a little nontrivial, but it's a dependency management problem. The limited set of projects that really need this do have maintainers that have some understanding of the issues. And I'm sure it'll go wrong once - but that's why there's a "yank" button on PyPI.

I wonder if this needs a larger discussion with the PyPA community, since the scope is bigger here than just auditwheel. If we want to properly support wheels which require some special external libraries, that could require changes in tooling in other areas as well.

Unless there are very concrete issues for the projects listed above that actually want to use this (did you have any in mind @lkollar?), it'd be great to just move this forward. In the end this is a ~10 line patch + tests, and if there really is a huge showstopper that turns up later - because there is none presented so far - then it can be reconsidered or reverted at that point. Requiring someone to start a large community-wide discussion, which may then veer into "please write a PEP" or "please consider all of the original pynativelib proposal" does not seem like a fair ask. Nor a necessary one. Not even desirable probably, since we don't want to encourage everyone to start doing this - only when one really needs it, and there's no other way out.

rgommers avatar Oct 11 '22 15:10 rgommers

Thanks for your feedback @rgommers.

I edited my comment about a community-wide discussion. I think you're absolutely right on that one. I guess @mattip changed his view about not wanting this PR to be merged. I will let @lkollar some time to answer your question though.

mayeut avatar Oct 11 '22 19:10 mayeut

Yes, I changed my view as I learned more about the pain points mentioned above: practicality sometimes beats purity.

mattip avatar Oct 11 '22 21:10 mattip

I think this PR also requires an option to update the rpath as:

  • #392

did. I.e., add the relative path of the skipped libraries in rpath. Otherwise, the wheel may be broken for missing libraries.

My first thought is to change the value of --exclude colon separate value as:

auditwheel repair --exclude <soname1>:<rpath-entry1> <soname2>:<rpath-entry2> <soname3> xxx.whl

if the value ends with :<rpath-entry> then update the rpath. Otherwise, skip without any change.

For example, my own use case (exclude libtorch*.so from PyTorch):

auditwheel repair --exclude \
    'libtorch_python.so:$ORIGIN/../torch/lib' \
    ...
    # very long list of sonames
    ...
    package.whl

EDIT: Even further, I think it would be better to support glob patterns rather than specify the sonames one by one. Because the dependency libraries grow over time while new .so files can be added in new releases. The developer will need to change the auditwheel command options when its dependency releases a new version.

For example, torch provides different wheels for different user-side runtimes (e.g. cpu. cu102, cu113). And the corresponding wheels ship different .so files (e.g. cpu build does not have CUDA-related libraries). The package maintainer will need to download all version and all builds of the dependency wheels and takes a union set of the present external .so files then add them to the auditwheel repair --exclude command line.

$ /usr/bin/ldd _C.cpython-38-x86_64-linux-gnu.so
        linux-vdso.so.1 (0x00007fff6d2f3000)
        libtorch_python.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libtorch_python.so (0x00007f59db972000)
        libgomp.so.1 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/libgomp.so.1 (0x00007f59db939000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f59db922000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f59db8ff000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f59db8f9000)
        libstdc++.so.6 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/libstdc++.so.6 (0x00007f59db743000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f59db5f4000)
        libgcc_s.so.1 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/libgcc_s.so.1 (0x00007f59db5db000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f59db3e9000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f59dc82e000)
        libshm.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libshm.so (0x00007f59db3df000)
        libtorch.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libtorch.so (0x00007f59db3d8000)
        libtorch_cuda.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so (0x00007f59db3b9000)
        libtorch_cuda_cpp.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so (0x00007f59d5786000)
        libnvToolsExt.so.1 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libnvToolsExt.so.1 (0x00007f59d557b000)
        libtorch_cpu.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so (0x00007f59c8e47000)
        libc10_cuda.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libc10_cuda.so (0x00007f59c8df1000)
        libcudart.so.11.0 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libcudart.so.11.0 (0x00007f59c8b49000)
        libcudnn.so.8 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libcudnn.so.8 (0x00007f59c891e000)
        libtorch_cuda_cu.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cu.so (0x00007f599b37d000)
        libc10.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libc10.so (0x00007f599b2e0000)
        libcusparse.so.11 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libcusparse.so.11 (0x00007f598d14b000)
        libcurand.so.10 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libcurand.so.10 (0x00007f598755a000)
        libcufft.so.10 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libcufft.so.10 (0x00007f597eacb000)
        libmkl_intel_lp64.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libmkl_intel_lp64.so (0x00007f597df2c000)
        libmkl_gnu_thread.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libmkl_gnu_thread.so (0x00007f597c3a1000)
        libmkl_core.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libmkl_core.so (0x00007f5977f31000)
        libcupti-ea0c9f68.so.11.6 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libcupti-ea0c9f68.so.11.6 (0x00007f5977682000)
        libcublas.so.11 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libcublas.so.11 (0x00007f596dc0a000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f596dc05000)
        libcublasLt.so.11 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libcublasLt.so.11 (0x00007f5958676000)

I think it would be nice to have:

auditwheel repair --exclude 'torch/lib/*.so:$ORIGIN/../torch/lib' package.whl

XuehaiPan avatar Oct 17 '22 13:10 XuehaiPan

I.e., add the relative path of the skipped libraries in rpath. Otherwise, the wheel may be broken for missing libraries.

No, please do not do that, it is not going to work. See https://github.com/pypa/auditwheel/issues/391#issuecomment-1272349511 for why, and further down in that discussion for a more robust solution: https://github.com/pypa/auditwheel/issues/391#issuecomment-1272380159.

rgommers avatar Oct 17 '22 14:10 rgommers

The rpath hack does not work for system dependencies (OpenGL, CUDA, Vulkan, ...), which is a wide realm that this PR aims to cover, so please do not do that.

leofang avatar Oct 17 '22 14:10 leofang

I.e., add the relative path of the skipped libraries in rpath. Otherwise, the wheel may be broken for missing libraries.

No, please do not do that, it is not going to work. See #391 (comment) for why, and further down in that discussion for a more robust solution: #391 (comment).

The rpath hack does not work for system dependencies (OpenGL, CUDA, Vulkan, ...), which is a wide realm that this PR aims to cover, so please do not do that.

@rgommers @leofang Thanks for the clarification. It's reasonable to skip only for the system dependencies. Waiting for a more robust approach for excluding third-party libraries. Adding a wrapping layer looks better than rpath hacking. I will try my best to investigate that.

XuehaiPan avatar Oct 17 '22 14:10 XuehaiPan

Let's get this merged. Thanks @martinRenou for the PR. Thanks to all for your inputs.

mayeut avatar Oct 22 '22 07:10 mayeut

Sorry for not answering earlier @mayeut. I was going to agree with merging this though. Thanks!

lkollar avatar Oct 24 '22 21:10 lkollar