Kokoro ONNX model inference fails when using the DirectML execution provider (Windows)
The model loads correctly with kokoro.provider=dml, but then on inference it gives this error:
Error: Non-zero status code returned while running ConvTranspose node. Name:'/encoder/F0.1/poo
l/ConvTranspose' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionPro
vider\src\MLOperatorAuthorImpl.cpp(2804)\onnxruntime.dll!00007FFA72289084: (caller: 00007FFA72
288E34) Exception(2) tid(1670) 80070057 The parameter is incorrect.
It seems to either be a lack of operator support in the current ONNX runtime DirectML provider implementation, or something else. I can't really make out what this error message means.
I reported this at the Kokoro repository. It may require a re-export of the model to fix.
The .onnx export was made by the developers of Tranformers.js. Maybe they can help.
Without this issue getting fixed, we can't get Kokoro working with GPU acceleration on Windows, since onnxruntime-node only provides the dml provider for GPU-based inference on Windows.
Update
I've verified the cuda execution provider (--kokoro.provider=cuda) does work correctly and provides significant speed increase when a GPU is available, on Linux (not currently available for Windows). This may mean the issue is more localized to DirectML, and not a general one.
To get GPU acceleration on Windows, for now, you can use the Windows Subsystem for Linux (WSL). You'll need to install CUDA Toolkit 12.x and cuDNN 9.x. Then, reinstall Echograden completely to enable cuda execution provider support (this is required since onnxruntime-node apparently downloads additional files on installation when it detects that CUDA is available).