Kokoro-FastAPI "Can't initialize NVML" on system with CUDA 13.0 and Maxwell GPU

Describe the bug Docker command: docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu Kokoro falls back to using CPU. This is fine, but I wanted to use my GPU. As I understand it, CUDA is somewhat backwards compatible, so my CUDA 13.0 system should run a CUDA 12.8 program fine, yes?

I was able to install and use Open-WebUI and its bundled Ollama with my GPU just fine.

Screenshots or console output

2025-10-30 01:23:43.851 | INFO     | __main__:download_model:60 - Model files already exist and are valid
/app/.venv/lib/python3.10/site-packages/torch/cuda/__init__.py:734: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
INFO:     Started server process [30]

Branch / Deployment used docker local on a headless machine Docker command: docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu

Operating System

me@my-box:~> cat /etc/os-release 
NAME="openSUSE Leap"
VERSION="15.6"
ID="opensuse-leap"
ID_LIKE="suse opensuse"

#Linux kernel version
me@my-box:~> uname -rp
6.4.0-150600.23.73-default x86_64

me@my-box:~> sudo nvidia-smi -q

==============NVSMI LOG==============

Timestamp                                 : Wed Oct 29 18:52:11 2025
Driver Version                            : 580.95.05
CUDA Version                              : 13.0

Attached GPUs                             : 1
GPU 00000000:01:00.0
    Product Name                          : Quadro K620
    Product Brand                         : Quadro
    Product Architecture                  : Maxwell

me@my-box:~> docker --version
Docker version 28.3.3-ce, build bea959c7b

Additional context Installed NVIDIA proprietary driver and NVIDIA-container-toolkit using https://en.opensuse.org/SDB:NVIDIA_drivers

Oct 30 '25 02:10 wired-filipino-owl

I'd guess that Pytorch + Maxwell GPU architecture is the problem. The Kokoro-FastAPI project uses Pytorch 2.8.0 with CUDA 12.9.

https://github.com/remsky/Kokoro-FastAPI/blob/88dcf00e4fc622b12eeb271e6f56aff860229646/pyproject.toml#L46

Unfortunately, Pytorch has removed support for Maxwell and Pascal architectures with CUDA 12.8 and 12.9 builds. Although, Pytorch still offers supported 12.6 builds. Additionally, CUDA 13 has deprecated library support for Maxwell.

You could try building a container with this change: #407

Oct 30 '25 06:10 ryan-steed-usa

@wired-filipino-owl were you able to test this fix?

Nov 05 '25 04:11 ryan-steed-usa

@wired-filipino-owl I've spent a bit of time to produce working builds and containers with my fork. The latest builds include changes from the master branch merged with the changes from this PR. If you get a chance to test, please reply. Thanks!

docker run --gpus all -p 8880:8880 ghcr.io/ryan-steed-usa/kokoro-fastapi-gpu:latest

Nov 11 '25 20:11 ryan-steed-usa

@ryan-steed-usa thank you! I will test when I get around to ripping out CUDA 13.0 and downgrading to 12.6 on my OpenSUSE machine. Swapping CUDA versions is an involved process 😓

Dec 15 '25 00:12 wired-filipino-owl