Applio [Bug]: Realtime mode perf issues with ROCm

Project Version

3.5.0

Platform and OS Version

Linux, ROCm 6.4, gfx1100

Affected Devices

N/A

Existing Issues

No response

What happened?

Mostly creating this to document my findings and start a conversation between AMD users.

Even after tuning the MiOpen FindDB, using real-time mode results in high GPU usage and spotty audio. The output is unstable and can't even hold for 1s without breaking apart.

When comparing to deiteris/voice-changer, that one uses <20% GPU while providing continuous output at 128ms buf size and 1.6s extra.

Steps to reproduce

Create venv
Install ROCm versions of torch, torchvision, torchaudio
Install remaining requirements from txt
Start real-time mode

Expected behavior

I would expect it to be able to provide continuous output on a high-end card.

Attachments

No response

Screenshots or Videos

No response

Additional Information

No response

Sep 19 '25 15:09 mitsuami-megane

I upgraded my GPU from an unsupported 5500 XT to a 9060 XT, which ROCm supports out of the box. I also have the same problem.

The real-time feature is currently unusable for me, even after tuning MIOpen. Regardless of the performance settings, the output voice always cuts out midway through. In contrast, the original RVC does not have this issue.

Sep 30 '25 02:09 TheTrustedComputer

Tested again with 3.6.0 using --client audio, ROCm 7.1

128ms chunk size and 1.6s extra → unusable 256ms chunk size and 1.6s extra → unusable 384ms chunk size and 1.6s extra → unusable 512ms chunk size and 1.6s extra → works and sounds ok but uses 100% of gpu

(default setting) 512ms chunk size and 0.5s extra → works and sounds ok but uses 100% of gpu

It reports a latency of around 300ms, while in reality the delay is above 1s.

Dec 07 '25 02:12 mitsuami-megane

Tested again with 3.6.0 using --client audio, ROCm 7.1

128ms chunk size and 1.6s extra → unusable 256ms chunk size and 1.6s extra → unusable 384ms chunk size and 1.6s extra → unusable 512ms chunk size and 1.6s extra → works and sounds ok but uses 100% of gpu

(default setting) 512ms chunk size and 0.5s extra → works and sounds ok but uses 100% of gpu

It reports a latency of around 300ms, while in reality the delay is above 1s.

I don’t have extensive experience with ROCm, but I believe this performance issue occurs for multiple reasons—for example, Applio currently lacks ONNX Runtime optimizations for acceleration, and real-time processing does not support FP16, which leads to increased GPU usage.

Regarding latency being lower than actual, you are correct. The displayed latency only reflects the real-time conversion pipeline and does not include the WebSocket transmission pipeline.

Dec 07 '25 03:12 PhamHuynhAnh16

@mitsuami-megane are you testing with TheRock build?

Dec 07 '25 03:12 AznamirWoW

No, what I did is:

Git clone tag 3.6.0
Created the venv
Commented out torch torchaudio torchvision from requirements.txt.
Installed the matching versions of torch torchaudio torchvision using the official PyTorch ROCm index instead
Installed the rest of the dependencies.

This is what I normally do with other torch-based software, and it seems to work well in most cases.

Where does one find this TheRock build?

Dec 07 '25 05:12 mitsuami-megane

Where does one find this TheRock build?

This one?

--index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/

Dec 07 '25 11:12 AznamirWoW

https://download.pytorch.org/whl/rocm6.4 is what I was using because this has a matching PyTorch version to what's in requirements.txt

With ROCm 7.1.1. As far as I'm aware, PyTorch targeting older ROCm works fine on newer ROCm installations.

I'm not sure how happy Applio would be on a completely different PyTorch version.

Dec 07 '25 12:12 mitsuami-megane

I'm not sure how happy Applio would be on a completely different PyTorch version.

well, you can try with the index I've provided and report issues if any.

Dec 07 '25 12:12 AznamirWoW