piper Can the GPU be used to create WAV files instead of the CPU?

Love everyone's work here!

Reading the readme - can the GPU be used to create wave files by using the --cuda parameter on the python version?

or is the python version / cuda ONLY for training?

Thanks in advance. I'm trying to get the fastest TTS I can for converting large documents.

Sep 10 '24 16:09 haydonryan

It can, but it's slower, at least for single inference.

I got it working on Ubuntu (in a virtual PC with access to a RTX 3060 via PCI-E passthrough). If I had to guess, it's due to all the overhead of setting up / preparing GPU, and then finally do an inference.

Sep 10 '24 16:09 FrontierDK

Ahh yeah that makes sense but unfortunate.. My 5950x is getting a solid workout here.

Sep 10 '24 17:09 haydonryan

I tried, but it didn't work, I guess. I appended this --cuda parameter to audio generation and still see CPU doing all the work instead of RTX 4060. is it something I am missing? I am using win 11 and piper.exe to test this. Would love to see your insights.

Sep 19 '24 21:09 rajuaryan21

It does work, yeah. But generation time, even after model is loaded into VRAM is x times longer than on CPU unfortunately

Sep 21 '24 08:09 thetznecker

I wonder why that is! Sadly i'm not up on CUDA optimization enough (yet) to understand why this might be the case. Would love to see some focus on that (but understand it's not really the main direction of the project)

Sep 21 '24 18:09 haydonryan

Just to confirm I understand, for use cases like the read-aloud extension where it needs to create multiple generations one after the other, GPU would likely be faster since it would be able to re-use the same state for subsequent generations, right?

Nov 11 '24 18:11 BryceBarbara

Just to confirm I understand, for use cases like the read-aloud extension where it needs to create multiple generations one after the other, GPU would likely be faster since it would be able to re-use the same state for subsequent generations, right?

Yes and no. I don't think you can hook into any cuda enviroment via a browser extension.

The GPU will always be more powerful than the CPU regarding AI - so yes, you would benefit from using GPU.

While GPU is more powerful - you can also implement "multi-threading" or loading multiple model instances into the "normal" RAM. This has nothing to do with GPU / CPU, you just need to have enough CPU ressources to process multiple instances - while being slower than multiple instances in vram on the gpu.

Nov 20 '24 16:11 thetznecker

I see these errors when running with --cuda:

2024-12-21 22:03:25.924069708 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 28 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-12-21 22:03:25.948383739 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-12-21 22:03:25.948407229 [W:onnxruntime:, session_state.cc:1170 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

How do I find out how to run it on GPU properly?

Dec 21 '24 19:12 gnusupport

I believe that the correct flag is --use-cuda and you need to get the latest code from git in order for the flag to do anything. You can check the flags supported by your version of piper by running piper --help. If you don't see --use-cuda there, then it's not supported by the version that you have.

Note that piper will simply ignore any flag that it doesn't understand - you won't see an error message if you give it a flag that isn't implemented.

Also note that you will have to install some packages for cuda to work. Exactly which packages depend upon your distribution, but it should give you clear error messages about what is missing that will enable you to figure it out.

I'm getting a segmentation fault when I try this - I'm looking into the problem to determine the cause.

Jan 29 '25 07:01 jlownie

I was wrong in my previous answer, for the Python version the flag is indeed --cuda.

For the executable version the flags are --cuda --use-cuda. You seem to need to use both, and it doesn't work - you will get a segmentation fault. I believe that this is due to linking to the libonnxruntime library instead of libonnxruntime-gpu, going by something that synesthesium said.

Many people (including myself) have found that they can't install the Python version using Pip, but you can follow these instructions to build a .whl file for piper-phonemise. You can then install piper-tts.

You also need to install onnxruntime-gpu. If you don't, Piper will silently ignore the --cuda flag and will run on the CPU only. You can verify that it is using the GPU by running nvidia-smi. If it is using the GPU then the figure for Utilisation should be between 80%-99%

When I got it working I saw a speed improvement of about 40%.

Jan 30 '25 05:01 jlownie

I was wrong in my previous answer, for the Python version the flag is indeed --cuda.

For the executable version the flags are --cuda --use-cuda. You seem to need to use both, and it doesn't work - you will get a segmentation fault. I believe that this is due to linking to the libonnxruntime library instead of libonnxruntime-gpu, going by something that synesthesium said.

Many people (including myself) have found that they can't install the Python version using Pip, but you can follow these instructions to build a .whl file for piper-phonemise. You can then install piper-tts.

You also need to install onnxruntime-gpu. If you don't, Piper will silently ignore the --cuda flag and will run on the CPU only. You can verify that it is using the GPU by running nvidia-smi. If it is using the GPU then the figure for Utilisation should be between 80%-99%

When I got it working I saw a speed improvement of about 40%.

could you share your computing resources? e.g cpu and gpu? i don't see that difference on my hardware (i9-14900kf, rtx 3090)

Jan 30 '25 06:01 thetznecker

CPU is Intel i5-8400 CPU @ 2.80GHz. GPU is GeForce 1050 Ti.

Did you confirm that your GPU was active during your runs?

Jan 30 '25 06:01 jlownie

Thanks.

For my configuration here below

CPU: Intel(R) Core(TM) i5-4430S (4) @ 2.70 GHz GPU: NVIDIA GeForce GTX 1050 Ti [Discrete]

I confirm that piper now uses GPU as I can clearly see it being loaded by using nvidia-smi.

Feb 02 '25 11:02 gnusupport

@thetznecker Neither --cuda --use-cuda or --cuda work for me.

@gnusupport In my case, I don't see piper loading in nvidia-smi.

I'm also using --debug, and I see no output about whether CUDA attempted to load or not.

Apr 26 '25 00:04 Geremia

@gnusupport

I see these errors when running with --cuda:

2024-12-21 22:03:25.924069708 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 28 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-12-21 22:03:25.948383739 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-12-21 22:03:25.948407229 [W:onnxruntime:, session_state.cc:1170 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

I get that, too. What it dumps to stdout doesn't appear to be WAV data, but "node assignment" data.

However, when I crtl-c, it says:

  File "/home/geremia/.local/bin/piper", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/geremia/.local/lib/python3.12/site-packages/piper/__main__.py", line 127, in main
    sys.stdout.buffer.write(audio_bytes)
KeyboardInterrupt

So it seems to be writing audio bytes to stdout.

Update: In my case, it was because I specified a --conf file whose settings (e.g., --length_scale) conflicted with what I specified in command line arguments. This worked fine for me now:

cat in.txt | piper --cuda --model /tmp/piper-voices/en_US-ryan-high.onnx --sentence-silence 0.0 --length_scale 0.1 -f out.wav

Apr 26 '25 20:04 Geremia

piper piper copied to clipboard

Can the GPU be used to create WAV files instead of the CPU?

For my configuration here below

piper
piper copied to clipboard