Jarredou

Results 49 comments of Jarredou

I think only MDX23C models trained with this script can be imported into UVR currently (like InstVocHQ) And probably Mel/BS-Roformers in upcoming update.

@HeChengHui This paper was published few days ago and demonstrates music source separation with 23ms latency (but don't expect top quality results with that fast processing): https://arxiv.org/abs/2402.17701

Dataset name: StemGMD Description: A Large-Scale Audio Dataset of Isolated Drum Stems for Deep Drums Demixing Instuments: Drums [Kick Drum, Snare, High Tom, Low-Mid Tom, High Floor Tom, Closed Hi-Hat,...

Some augmentations can also cause slowdowns during training when enabled (in particular pitch-shifting, time-stretching and mp3 encoding) and at least some of them, if not all, are done on CPU....

If it's pedalboard's distortion that was slow, I would recommend to fully remove that augmentation as it's also creating huge gain changes while audiomentations has better alternative like tanh that...

If needed UVR's beta with roformers and directML code is available here https://github.com/Anjok07/ultimatevocalremovergui/tree/v5.6.0_roformer_add%2Bdirectml

Not really answering the question, but using (SI-)SDR as training loss is not always great because it can't handle silent chunks properly. I recommend using https://github.com/crlandsc/torch-log-wmse/ as waveform based loss...

You can try my Colab fork, which is a bit more up to date, you should be able to run inference.py (after installing requirements) on linux without using the colab...

Yeah, I would also appreciate a command line feature !

You can ensemble already separated audio with https://github.com/ZFTurbo/Music-Source-Separation-Training and its ensemble.py script