cuvs [FEA] reduce libcuvs binary size

[FEA] reduce libcuvs binary size

Open jameslamb opened this issue 9 months ago • 0 comments

Is your feature request related to a problem? Please describe.

As of this writing, the latest 25.04 libcuvs wheels have the following sizes:

distribution	arch	size (compressed)	size (uncompressed)
`libcuvs-cu11`	x86_64	948 MiB	1377 MiB
`libcuvs-cu11`	aarch64	999 MiB	1378 MiB
`libcuvs-cu12`	x86_64	1129 MiB	1626 MiB
`libcuvs-cu12`	aarch64	1128 MiB	1628 MiB

NOTE: v25.4.0a84, latest 25.04 nightly as of Feb 27, 2025

how I got those sizes (click me)

Downloaded wheels from https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/libcuvs-cu12 and https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/libcuvs-cu11.

pydistcheck \
  --inspect \
  --output-file-size-unit Mi \
  --output-file-size-precision 1 \
  --select 'distro-too-large-compressed' \
  ./libcuvs-wheels/*.whl

Such large packages require a relatively large amount of network bandwidth to download and disk space to store.

This issue tracks the work of reducing those sizes.

Describe the solution you'd like

Those package sizes should be reduced. It's tough to say how small is small enough, but the goal "get wheels on pypi.org" provides one set of targets.

PyPI requirements for individual files:

over 100 MiB = requires an exception (see https://github.com/pypi/support/issues)
over 1GiB = absolutely not allowed

There are also limits for the total size of a project, summed over all releases... so smaller packages = more releases that can be hosted on PyPI.

Describe alternatives you've considered

Some ideas for how to address this:

compiling fewer combinations of templates
- https://github.com/rapidsai/cuvs/issues/110
- similar change in cugraph: https://github.com/rapidsai/cugraph/pull/4720
tuning compiler flags to optimize for binary size
checking the wheel contents and removing any extraneous files found
replacing static linking / vendoring with dynamic linking where possible
- e.g., relying on nvidia-nccl-cu{11,12} wheels instead of vendoring a copy of libnccl.so
code changes to avoid recompiling the same kernels multiple times
- related conversation: https://github.com/rapidsai/cuvs/issues/634

Additional context

Attaching a report @robertmaynard put together showing the approximate contribution to total binary size of each CUDA kernel in libcuvs.

size.log

Feb 27 '25 22:02 jameslamb

cuvs cuvs copied to clipboard

[FEA] reduce libcuvs binary size

cuvs
cuvs copied to clipboard