nexa-sdk [Bridge] Missing Binary Packages / Updated Install Instructions

Pre-checks

[x] I searched existing issues
[x] I’m using the latest NexaSDK release

What happened?

From version 1.0.37 onwards, the Python binaries are missing from PyPI. Instead, the source code is uploaded.

It would be helpful to either fix the relevant workflows to ensure that binary versions are published correctly, and/or provide instructions for compiling the package for CUDA and Metal on end-user devices.

Steps to reproduce

pip install nexaai==1.0.37

Logs (Selected)

Collecting nexaai
  Downloading nexaai-1.0.37.tar.gz (61 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: nexaai
  Building wheel for nexaai (pyproject.toml) ... done
  Created wheel for nexaai: filename=nexaai-1.0.37-py3-none-any.whl size=269665019 sha256=9d2e287685927f53025401705d289328e00e4f6f581ce7736e935068e87bdca1
Successfully built nexaai

NexaSDK version

Nexa SDK Bridge 1.0.37

Install method

pip

OS and version

All

Hardware / accelerator

All

Dec 19 '25 04:12 iwr-redmond

Looking at the source package's pyproject.toml and setup.py, I can see that there are [cuda] and [mlx] options that should be documented here.

There is also a download_and_extract() function in setup.py that obtains precompiled binaries from https://nexa-model-hub-bucket.s3.us-west-1.amazonaws.com/public/nexasdk/v{version}/{os_name}_{arch}/{ARCHIVE}. Whether this precludes uploading binaries to PyPI is unclear.

Dec 19 '25 04:12 iwr-redmond

Hi @iwr-redmond, thanks for reporting this.

Starting from v1.0.37, we intentionally switched the Python package distribution from prebuilt wheels to sdist. Previously, maintaining PyPI wheels across different OSes, architectures, and accelerators led to incomplete platform coverage (for example, Linux platform is hard to support).

With the current approach, we are able to support Windows, Linux, and macOS on both x64 and arm64 in a more consistent way, without maintaining a large and fragile PyPI wheel matrix.

The PyPI package itself is now source-only. During installation, the setup script automatically downloads the appropriate precompiled native binaries from our public model hub (the same public repository referenced in this repo, as shown in runner/Makefile), based on the detected OS and architecture.

We agree that this behavior should be documented more clearly. In particular:

The Metal / MLX setup should be explicitly documented.
For CUDA, we intentionally avoid requiring additional manual steps from users; the installer performs platform and capability checks automatically.

We will update the installation documentation to clarify these points. Thanks again for bringing this up.

Dec 19 '25 14:12 mengshengwu

You may wish to consider taking the infrastructure you have created and using some Github Actions to "compile", or really just assemble!, the five architectures into binary wheels for secondary upload to PyPI. As there would not be any C++ compilation, the resources involved would be minimal. This would be similar to what Nomic used to do for GPT4all. As you can see here, only three binaries were required for full coverage in their case.

Dec 19 '25 16:12 iwr-redmond

You may wish to consider taking the infrastructure you have created and using some Github Actions to "compile", or really just assemble!, the five architectures into binary wheels for secondary upload to PyPI. As there would not be any C++ compilation, the resources involved would be minimal. This would be similar to what Nomic used to do for GPT4all. As you can see here, only three binaries were required for full coverage in their case.

Thanks for the suggestion — we did consider publishing secondary binary wheels via GitHub Actions.

The current sdist-based approach is a deliberate design choice, and from an end-user perspective the installation experience is effectively unchanged:

Although the package is distributed as sdist on PyPI, all native binaries are already precompiled. During pip install, the installer downloads the appropriate binary, assembles a local wheel, and installs it. No C/C++ compilation happens on the user’s machine, and the total download size is comparable to our previous wheel-based releases. In practice, this behaves the same as installing a prebuilt wheel.
The native binaries downloaded at install time are produced and published automatically via our GitHub Actions CI/CD pipeline. The same pipeline builds the artifacts for all supported platforms and uploads them to our public object storage, ensuring the process is reproducible and versioned, rather than manually managed.The binary artifacts are built from the same commit as the PyPI release tag.
By distributing via sdist, the PyPI artifact itself contains no platform-specific binaries. This allows us to avoid publishing and maintaining a large matrix of wheels with OS, architecture, and Python-version tags. This is particularly important for us because the native layer uses pybind11 APIs that are not compatible with the stable py3 ABI. As a result, any wheel-based distribution would require Python-version–specific tags, significantly increasing the number of artifacts we would need to manage.
Prior to this change, when we embedded binaries directly into PyPI wheels, we ran into several practical limitations: a. Each wheel had to be tagged with OS, architecture, and Python version (due to the non-py3 ABI usage), which limited platform coverage and slowed our ability to provide day-0 support for new platforms, models, and features. b. PyPI’s project storage limit (15 GB, without an approved quota increase) forced us to reduce release frequency and remove older versions to stay within the limit. By hosting binaries in public object storage and downloading them at install time, we can now retain significantly more historical versions and ship updates at a much higher cadence. The current sdist artifact itself is ~65 KB.

Given these constraints, the current model gives us full PC platform coverage (Windows / Linux / macOS on x64 and arm64), faster iteration, and sustainable distribution, while preserving the same installation experience for users.

That said, we agree this architecture should be clearly documented, and we are updating the installation docs accordingly.

Dec 20 '25 05:12 mengshengwu