Use cibuildwheel setup + options (e.g. https://github.com/benfred/implicit/blob/main/pyproject.toml) to precompile C++/CUDA?