demucs.cpp icon indicating copy to clipboard operation
demucs.cpp copied to clipboard

Realtime processing

Open hasaranga opened this issue 8 months ago • 4 comments
trafficstars

Is it possible to use this for real-time processing? I want to extract vocals from incoming audio stream and play it using separate audio device.

hasaranga avatar Mar 22 '25 04:03 hasaranga

I can't imagine that it's possible with the current code and AI model architecture.

Demucs works by splitting the input audio into segments 7.8 seconds long (343980 samples @ 44100 Hz sample rate - you'll see the value 343980 in a lot of places in this code).

Anything realtime (10ms) will have to be padded with an insane amount of zeros (and with 99% of the 7.8 seconds being useless and only 10ms being real music), this passes through the neural network. Even then, the question is how fast can the results for that 10ms (padded to become 7.8s) chunk can be computed.

For legitimate realtime processing, I think re-training demucs with a smaller segment size is an important first step.

sevagh avatar Mar 22 '25 20:03 sevagh

Yes, I found this vst plugin https://neutone.ai/fx and they did it just as you said. But the separation quality is low.

hasaranga avatar Mar 23 '25 03:03 hasaranga

I have my own trained model that I believed I designed for realtime usage (but low quality): https://github.com/sevagh/xumx-sliCQ

I never really tried it in a serious application but if you're interested, I'm curious if it works well

sevagh avatar Mar 23 '25 08:03 sevagh

I will try!

hasaranga avatar Mar 23 '25 17:03 hasaranga