Kokoro-FastAPI How to install this on windows?

Is this installable on windows? Or is it only for linux?

Mar 11 '25 10:03 AlargerCorgi

Yes, however you need to install docker first.

I routinely use FastKoko (the WebUI) under Docker under Win 11.

Mar 11 '25 16:03 RBEmerson970

It can be run on windows and Linux either through a docker container or natively (Do note that for windows there is no start script but it can be done roughly the same way as on Linux)

Mar 11 '25 22:03 fireblade2534

So I installed docker-desktop.

I ran the following:

git clone https://github.com/remsky/Kokoro-FastAPI.git cd Kokoro-FastAPI

cd docker/gpu # For GPU support docker compose up --build

Then after it downloaded models and attempted to run, I got this error. Does anyone know what to do here? It see its using CUDA 2.8.

| RuntimeError: CUDA error: no kernel image is available for execution on the device | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. | For debugging consider passing CUDA_LAUNCH_BLOCKING=1 | Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I ran python -c "import torch; print(torch.__version__)" which returned 2.6.0+cu126

I am on a 50 series NVIDIA GPU so could it be that PyTorch needs to be 2.8.0+cu128 which is why I get this error?

If so, how do I update PyTorch to 2.8 from within the kokoro env?

Mar 12 '25 06:03 AlargerCorgi

Why do the build?

From README.md:

docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.2  #NVIDIA GPU

Substitute "latest" for "v0.2.2". I add "127.0.0.1:" in front of the port numbers. On my system 0.0.0.0 doesn't work but tthat probably only a local issue.

Overall, the biggest problem is keeping track of the correct working version. IMNSHO this repo's version control is a trainwreck. }:(

Mar 12 '25 14:03 RBEmerson970

Overall, the biggest problem is keeping track of the correct working version. IMNSHO this repo's version control is a trainwreck. }:(

Do you have some sugestions to make it better because I agree that the version control could use some work

Mar 12 '25 14:03 fireblade2534

@fireblade2534 I'm not really a coder, mostly an end user. That being said, as I've said here several times, VERSION and README.md both have old version numbers, and they don't agree with the Release version number. Adding a rev. number to the WebUI page wouldn't hurt.

The release number on the Code page needs to tied to a stable release branch. I suggest using "v0.2.4" to catch up the stable changes which have been introduced over the past couple pf weeks, and freeze v0.2.4 as a formal release until another release seems reasonable. Nightlys exist for the "let's try this and see what happens" changes.

As a general comment, I suspect some requests for changes, or attempts at changes, really belong to developing Kokoro-82M; changes here may be "the tail wagging the dog".

As an end user, I use Fast-Koko to read back various writing projects I'm working on (as a writer). In some cases there's more than one language (e.g., just finished a short story with German mixed into the story). In a perfect world, it'd be nice to hear the German back as well as the English is read back.

Unfortunately, it appears that Kokoro-82M isn't going to support custom voices. Zonos, F5-TTS, et al. are great (in varying degrees) for a few seconds of material. After that, "pay here, please". Sure would be nice to hear one of my Westerns read back in John Wayne's voice, though. ;D

Mar 12 '25 15:03 RBEmerson970

The code page shows v0.2.3 as the current release but...

PS C:\Users\pavil> docker run --gpus all -p 127.0.0.1:8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.3
Unable to find image 'ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.3' locally
docker: Error response from daemon: failed to resolve reference "ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.3": ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.3: not found.

Of course I can use "latest" instead, but that risks bringing down a nightly (which may or may not be stable) and not 0.2.3 . (And VERSION and README.md remain out of date...)

ADDED: And while I'm being cranky irritable, please see the following snippet from the startup:

Model warmed up on cuda: kokoro_v1CUDA: True
67 voice packs loaded

Beta Web Player: http://0.0.0.0:8880/web/
or http://localhost:8880/web/
░░░░░░░░░░░░░░░░░░░░░░░░

INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8880 (Press CTRL+C to quit)

At least here, 0.0.0.0 doesn't work. When I use run to make a Docker container I use 127.0.0.1 in the command. The startup message should reflect the selected IP. In a perfect world, I'd default to 127.0.0.1 in place of 0.0.0.0 but that's just me.

Mar 13 '25 15:03 RBEmerson970

The version thread really should move here: #212

Please see comments just added there.

Mar 14 '25 14:03 RBEmerson970

The latest docker image worked fine for me when launched from within Manjaro Linux running in WSL 2 on Windows 11 when I had an NVIDIA RTX 4070 Ti 16GB, but the container fails to start now that I've replaced the GPU with an NVIDIA RTX 5090.

From reading the docker container's logs, it looks the bundled version of PyTorch doesn't support 50-series GPUs. This is not a Windows problem. Recommendations elsewhere are to use the nightly version of PyTorch, but a more recent stable version of PyTorch might also contain support for it. The URL in the logs, https://pytorch.org/get-started/locally/, doesn't specifically mention the 50-series GPUS, so I'm not sure.

For reference, here is the docker container's logs for one failed attempt at startup:

==========
== CUDA ==
==========

CUDA Version 12.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

2025-03-27 17:08:17.532 | INFO     | __main__:download_model:60 - Model files already exist and are valid
INFO:     Started server process [30]
INFO:     Waiting for application startup.
05:08:21 PM | INFO     | main:57 | Loading TTS model and voice packs...
05:08:21 PM | INFO     | model_manager:38 | Initializing Kokoro V1 on cuda
05:08:21 PM | DEBUG    | paths:101 | Searching for model in path: /app/api/src/models
05:08:21 PM | INFO     | kokoro_v1:45 | Loading Kokoro model on cuda
05:08:21 PM | INFO     | kokoro_v1:46 | Config path: /app/api/src/models/v1_0/config.json
05:08:21 PM | INFO     | kokoro_v1:47 | Model path: /app/api/src/models/v1_0/kokoro-v1_0.pth
/app/.venv/lib/python3.10/site-packages/torch/nn/modules/rnn.py:123: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1
  warnings.warn(
/app/.venv/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:143: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
  WeightNorm.apply(module, name, dim)
/app/.venv/lib/python3.10/site-packages/torch/cuda/__init__.py:235: UserWarning:
NVIDIA GeForce RTX 5090 with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.
If you want to use the NVIDIA GeForce RTX 5090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(
05:08:22 PM | ERROR    | main:70 | Failed to initialize model: Warmup failed: Failed to load model: Failed to load Kokoro model: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

ERROR:    Traceback (most recent call last):
  File "/app/api/src/inference/kokoro_v1.py", line 53, in load_model
    self._model = self._model.cuda()
  File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1053, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 290, in _apply
    self._init_flat_weights()
  File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 215, in _init_flat_weights
    self.flatten_parameters()
  File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 271, in flatten_parameters
    torch._cudnn_rnn_flatten_weight(
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/api/src/inference/model_manager.py", line 127, in load_model
    await self._backend.load_model(path)
  File "/app/api/src/inference/kokoro_v1.py", line 58, in load_model
    raise RuntimeError(f"Failed to load Kokoro model: {e}")
RuntimeError: Failed to load Kokoro model: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/api/src/inference/model_manager.py", line 66, in initialize_with_warmup
    await self.load_model(model_path)
  File "/app/api/src/inference/model_manager.py", line 131, in load_model
    raise RuntimeError(f"Failed to load model: {e}")
RuntimeError: Failed to load model: Failed to load Kokoro model: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/home/appuser/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/app/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/home/appuser/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/app/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/home/appuser/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/app/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/home/appuser/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/app/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/home/appuser/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/app/api/src/main.py", line 65, in lifespan
    device, model, voicepack_count = await model_manager.initialize_with_warmup(
  File "/app/api/src/inference/model_manager.py", line 99, in initialize_with_warmup
    raise RuntimeError(f"Warmup failed: {e}")
RuntimeError: Warmup failed: Failed to load model: Failed to load Kokoro model: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


ERROR:    Application startup failed. Exiting.

Mar 27 '25 17:03 FlippingBinary

IMNSHO, I'd suspect the 50 series GPU and associated drivers, etc. It's been suggested more than once they have...um...issues.

Using containers built with "docker run" under Docker 4.39.0 under Win 11 24H2, on a Legion 7 with an i9 and RTX4090... no worries.

Mar 27 '25 22:03 RBEmerson970

@RBEmerson970 It's not a problem in the drivers or GPU. It's just a new GPU architecture like the 40 series was years ago. It takes time for a version of PyTorch with support for a new GPU architecture to make its way through nightly builds before it appears in a stable release. Since Kokoro-FastAPI's docker build doesn't use a version of PyTorch that supports the 50 series, it won't run on it yet.

To put it another way, it's like trying to run an ARM binary on an x86 PC. This isn't a bug or defect in the hardware and has nothing to do with the OS. It's just a new architecture.

According to the PyTorch forum, version 2.7 of PyTorch with support for the 50 series is planned for April. I spent some time trying to get Kokoro-FastAPI running on a nightly version of PyTorch, but couldn't get it to resolve the pytorch-triton dependency. Maybe someone else will have better luck, but I'll just have to wait for PyTorch 2.7 to be released.

Mar 27 '25 23:03 FlippingBinary

It can also start from source code instead of docker.

Just refer to start-gpu.ps1 or start-cpu.ps1.

I I test gpu mode on Windows 10.

git clone --depth=1 https://github.com/remsky/Kokoro-FastAPI
cd Kokoro-FastAPI
uv venv (--python 3.10)
.venv\Scripts\activate.bat
set PHONEMIZER_ESPEAK_LIBRARY="<fullpath>\eSpeak NG\libespeak-ng.dll"
set PYTHONUTF8=1
set PROJECT_ROOT=%cd%
set USE_GPU=true
set USE_ONNX=false
set PYTHONPATH=%PROJECT_ROOT%;%PROJECT_ROOT%\api
set MODEL_DIR=src\models
set VOICES_DIR=src\voices\v1_0
set WEB_PLAYER_PATH=%PROJECT_ROOT%\web
uv pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124
uv pip install -e ".[gpu]"
uv run --no-sync python docker/scripts/download_model.py --output api/src/models/v1_0
uv run --no-sync uvicorn api.src.main:app --host 127.0.0.1 --port 8880

Finally, you can create a start_kokoro-fastapi.bat:

@echo off

cd <fullpath>\Kokoro-FastAPI
call .venv\Scripts\activate.bat
set PHONEMIZER_ESPEAK_LIBRARY="<fullpath>\eSpeak NG\libespeak-ng.dll"
set PYTHONUTF8=1
set PROJECT_ROOT=%cd%
set USE_GPU=true
set USE_ONNX=false
set PYTHONPATH=%PROJECT_ROOT%;%PROJECT_ROOT%\api
set MODEL_DIR=src\models
set VOICES_DIR=src\voices\v1_0
set WEB_PLAYER_PATH=%PROJECT_ROOT%\web
uv run --no-sync python docker/scripts/download_model.py --output api/src/models/v1_0
uv run --no-sync uvicorn api.src.main:app --host 127.0.0.1 --port 8880

pause

Then you can run .bat file to start it.

Apr 21 '25 23:04 scillidan

Simple fix for Blackwell/50 series is changing the torch version now that PyTorch 2.7.0 is live:

Modify the pyproject.toml:

...
[project.optional-dependencies]
gpu = [
    "torch==2.7.0+cu128",
]
cpu = [
    "torch==2.7.0",
]
...

[[tool.uv.index]]
name = "pytorch-cuda"
url = "https://download.pytorch.org/whl/cu128"
explicit = true
...

Build the image :

cd docker/gpu
docker compose build

Run it

docker run --gpus all -p 8880:8880 kokoro-tts-gpu-kokoro-tts:latest

Apr 27 '25 05:04 mrbaker823