cuvs
cuvs copied to clipboard
[FEA] reduce libcuvs binary size
Is your feature request related to a problem? Please describe.
As of this writing, the latest 25.04 libcuvs wheels have the following sizes:
| distribution | arch | size (compressed) | size (uncompressed) |
|---|---|---|---|
libcuvs-cu11 |
x86_64 | 948 MiB | 1377 MiB |
libcuvs-cu11 |
aarch64 | 999 MiB | 1378 MiB |
libcuvs-cu12 |
x86_64 | 1129 MiB | 1626 MiB |
libcuvs-cu12 |
aarch64 | 1128 MiB | 1628 MiB |
NOTE: v25.4.0a84, latest 25.04 nightly as of Feb 27, 2025
how I got those sizes (click me)
Downloaded wheels from https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/libcuvs-cu12 and https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/libcuvs-cu11.
pydistcheck \
--inspect \
--output-file-size-unit Mi \
--output-file-size-precision 1 \
--select 'distro-too-large-compressed' \
./libcuvs-wheels/*.whl
Such large packages require a relatively large amount of network bandwidth to download and disk space to store.
This issue tracks the work of reducing those sizes.
Describe the solution you'd like
Those package sizes should be reduced. It's tough to say how small is small enough, but the goal "get wheels on pypi.org" provides one set of targets.
PyPI requirements for individual files:
- over 100 MiB = requires an exception (see https://github.com/pypi/support/issues)
- over 1GiB = absolutely not allowed
There are also limits for the total size of a project, summed over all releases... so smaller packages = more releases that can be hosted on PyPI.
Describe alternatives you've considered
Some ideas for how to address this:
- compiling fewer combinations of templates
- https://github.com/rapidsai/cuvs/issues/110
- similar change in
cugraph: https://github.com/rapidsai/cugraph/pull/4720
- tuning compiler flags to optimize for binary size
- checking the wheel contents and removing any extraneous files found
- replacing static linking / vendoring with dynamic linking where possible
- e.g., relying on
nvidia-nccl-cu{11,12}wheels instead of vendoring a copy oflibnccl.so
- e.g., relying on
- code changes to avoid recompiling the same kernels multiple times
- related conversation: https://github.com/rapidsai/cuvs/issues/634
Additional context
Attaching a report @robertmaynard put together showing the approximate contribution to total binary size of each CUDA kernel in libcuvs.