[Bridge] Missing Binary Packages / Updated Install Instructions
Pre-checks
- [x] I searched existing issues
- [x] I’m using the latest NexaSDK release
What happened?
From version 1.0.37 onwards, the Python binaries are missing from PyPI. Instead, the source code is uploaded.
It would be helpful to either fix the relevant workflows to ensure that binary versions are published correctly, and/or provide instructions for compiling the package for CUDA and Metal on end-user devices.
Steps to reproduce
pip install nexaai==1.0.37
Logs (Selected)
Collecting nexaai
Downloading nexaai-1.0.37.tar.gz (61 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: nexaai
Building wheel for nexaai (pyproject.toml) ... done
Created wheel for nexaai: filename=nexaai-1.0.37-py3-none-any.whl size=269665019 sha256=9d2e287685927f53025401705d289328e00e4f6f581ce7736e935068e87bdca1
Successfully built nexaai
NexaSDK version
Nexa SDK Bridge 1.0.37
Install method
pip
OS and version
All
Hardware / accelerator
All
Looking at the source package's pyproject.toml and setup.py, I can see that there are [cuda] and [mlx] options that should be documented here.
There is also a download_and_extract() function in setup.py that obtains precompiled binaries from https://nexa-model-hub-bucket.s3.us-west-1.amazonaws.com/public/nexasdk/v{version}/{os_name}_{arch}/{ARCHIVE}. Whether this precludes uploading binaries to PyPI is unclear.
Hi @iwr-redmond, thanks for reporting this.
Starting from v1.0.37, we intentionally switched the Python package distribution from prebuilt wheels to sdist. Previously, maintaining PyPI wheels across different OSes, architectures, and accelerators led to incomplete platform coverage (for example, Linux platform is hard to support).
With the current approach, we are able to support Windows, Linux, and macOS on both x64 and arm64 in a more consistent way, without maintaining a large and fragile PyPI wheel matrix.
The PyPI package itself is now source-only. During installation, the setup script automatically downloads the appropriate precompiled native binaries from our public model hub (the same public repository referenced in this repo, as shown in runner/Makefile), based on the detected OS and architecture.
We agree that this behavior should be documented more clearly. In particular:
- The Metal / MLX setup should be explicitly documented.
- For CUDA, we intentionally avoid requiring additional manual steps from users; the installer performs platform and capability checks automatically.
We will update the installation documentation to clarify these points. Thanks again for bringing this up.
You may wish to consider taking the infrastructure you have created and using some Github Actions to "compile", or really just assemble!, the five architectures into binary wheels for secondary upload to PyPI. As there would not be any C++ compilation, the resources involved would be minimal. This would be similar to what Nomic used to do for GPT4all. As you can see here, only three binaries were required for full coverage in their case.
You may wish to consider taking the infrastructure you have created and using some Github Actions to "compile", or really just assemble!, the five architectures into binary wheels for secondary upload to PyPI. As there would not be any C++ compilation, the resources involved would be minimal. This would be similar to what Nomic used to do for GPT4all. As you can see here, only three binaries were required for full coverage in their case.
Thanks for the suggestion — we did consider publishing secondary binary wheels via GitHub Actions.
The current sdist-based approach is a deliberate design choice, and from an end-user perspective the installation experience is effectively unchanged:
-
Although the package is distributed as sdist on PyPI, all native binaries are already precompiled. During
pip install, the installer downloads the appropriate binary, assembles a local wheel, and installs it. No C/C++ compilation happens on the user’s machine, and the total download size is comparable to our previous wheel-based releases. In practice, this behaves the same as installing a prebuilt wheel. -
The native binaries downloaded at install time are produced and published automatically via our GitHub Actions CI/CD pipeline. The same pipeline builds the artifacts for all supported platforms and uploads them to our public object storage, ensuring the process is reproducible and versioned, rather than manually managed.The binary artifacts are built from the same commit as the PyPI release tag.
-
By distributing via sdist, the PyPI artifact itself contains no platform-specific binaries. This allows us to avoid publishing and maintaining a large matrix of wheels with OS, architecture, and Python-version tags. This is particularly important for us because the native layer uses pybind11 APIs that are not compatible with the stable
py3ABI. As a result, any wheel-based distribution would require Python-version–specific tags, significantly increasing the number of artifacts we would need to manage. -
Prior to this change, when we embedded binaries directly into PyPI wheels, we ran into several practical limitations: a. Each wheel had to be tagged with OS, architecture, and Python version (due to the non-
py3ABI usage), which limited platform coverage and slowed our ability to provide day-0 support for new platforms, models, and features. b. PyPI’s project storage limit (15 GB, without an approved quota increase) forced us to reduce release frequency and remove older versions to stay within the limit. By hosting binaries in public object storage and downloading them at install time, we can now retain significantly more historical versions and ship updates at a much higher cadence. The current sdist artifact itself is ~65 KB.
Given these constraints, the current model gives us full PC platform coverage (Windows / Linux / macOS on x64 and arm64), faster iteration, and sustainable distribution, while preserving the same installation experience for users.
That said, we agree this architecture should be clearly documented, and we are updating the installation docs accordingly.