voice-changer
voice-changer copied to clipboard
Repeated sounds at low input chunk num
Running on a computer with a I7-10700k and a 4080, I start to run into issues in both server mode and client mode at low input chunk nums. Namely in server mode the voice will start sounding like it's repeating sounds and hitching up. For example, an input chunk num of 80 does it. Though, I don't see my CPU or GPU being maxed out.
This is using an .onxx file with a RVC v2 model.
What ends up being the bottleneck in terms of hardware? Is there a way to improve this locally in how I configure the software?
I am not sure exactly, as it depends on the environment, but it could be a limitation of the RVC processing method rather than the H/W.
I am not sure exactly, as it depends on the environment, but it could be a limitation of the RVC processing method rather than the H/W.
Any idea what would cause the performance to jump around? Like earlier I was able to run at 80 chunk num with no problem in client mode, but attempting those same configurations now i'm even getting hitching at 112 chunk num no matter how I adjust things.
It seems like performance is inconsistent even when attempting the same configuration and parameters.
I am not sure exactly, as it depends on the environment, but it could be a limitation of the RVC processing method rather than the H/W.
Any idea what would cause the performance to jump around? Like earlier I was able to run at 80 chunk num with no problem in client mode, but attempting those same configurations now i'm even getting hitching at 112 chunk num no matter how I adjust things.
It seems like performance is inconsistent even when attempting the same configuration and parameters.
I've also experienced some inconsistency like that. It's hard to say exactly what causes it in my case, as conditions vary a lot, but I've found that sometimes chunk num 64 is stable on a 3080 Ti and onnx model, but other times it can be a bit rough, with random pitch jumping. Subjectively, it's felt more common to experience instability, pitch jumping, and needing to increase the chunk num since 1.5.3.2, even on RVC v1 models, but it's hard to isolate the variables.
I'll also add that the repeated sounds seems to be what happens when res time significantly exceeds the buffer in server mode. In client mode, the voice will kind of cut out for a moment when res exceeds buffer. In server mode you get the "broken record" type of result, where a segment of audio is repeated a few times. I've noticed that res can exceed the buffer by a small amount without issue, but when it's by more than a couple ms you get those effects
The idea of it being a limitation of RVC makes sense, as it does seem like even if you have plenty of GPU headroom, it won't use it all, even if it means failing to infer the audio segment in time
Subjectively, it's felt more common to experience instability, pitch jumping, and needing to increase the chunk num since 1.5.3.2, even on RVC v1 models, but it's hard to isolate the variables.
Hey could you verify that 1.5.3.2 is using GPU when using ONNX with RVC? I think there might be a problem where it doesn't and I am currently trying to verify and maybe find out why and submit a fix. For me it is using CPU and is thus slower. I could use non ONNX but that was always a little bit slower and ONNX on GPU was the most stable and fast. I first thought only the dropdown was missing but its supposed to work with the GPU num selector now see #246 But for me it stays on CPU.
Hey could you verify that 1.5.3.2 is using GPU when using ONNX with RVC?
Yes it does. I just checked a PyTorch model and its corresponding ONNX model, and they both used my GPU. CPU usage was high with both as well. I didn't notice any major difference between them in proportions either. In converter settings, my GPU is set to 0.
If you use Harvest as a feature detector, it will increase your CPU usage. Harvest is of high quality but is very resource-intensive. Dio is not as good as Harvest in terms of quality, but it is lightweight. Both Harvest and Dio use the CPU, but Crepe uses the GPU. Crepe is of high quality like Harvest and is resource-intensive, but it can make use of the GPU. The functionality of Crepe will be released in the next version.
crepe is released.