spleeter icon indicating copy to clipboard operation
spleeter copied to clipboard

[Discussion] How does spleeter handle stereo files when training?

Open dustyny opened this issue 2 years ago • 2 comments

I'm creating a dataset for training my own model. At the moment my dataset is comprised of mono loops. I'm trying to figure out if I should be generating stereo but i don't know how Spleeter handles mono vs stereo. I've tried digging through the code but I'm a novice with Tensorflow and this is a complex project.

Does it train on a stereo wav or does it split the wav up into 2 mono wavs (Left & Right channels)? If it uses both channels at once are they in context of one another? If I have sounds panned hard left & right does that matter?

If I have 1,000 stereo sets does it treat it as 2,000 examples or just 1,000?

Much appreciated..

dustyny avatar Apr 25 '22 22:04 dustyny

Hi @dustyny, The pre-trained models of spleeter were trained on stereo files. Both channels are fed synchrounously as input of the model. A trained model should be able to deal with panning variations if it was trained with such data (data augmentation may be needed to support it then). 1000 stereo snippets will be treated as 1000 samples. It should be possible to train new models with mono signal using the n_channels option in the config file, though.

romi1502 avatar Apr 29 '22 10:04 romi1502

@romi1502 thank you for that answer, very helpful!. I have started training, my test run on 5k mono drum loops ended with these results.. TBH I don't know what they mean, what does good look like?

absolute_difference = 1.7624866, global_step = 200000, loss = 1.7624866, brass_perc_spectrogram = 0.3552358, kicks_spectrogram = 0.3987537, pitched_perc_spectrogram = 0.5570321, s snares_claps_spectrogram = 0.45146513

If the data is only being trained on one channel do you think that could improve or reduce accuracy? I'm training on 55k mono loops now. Hopefully that will give a good result 🤞.

dustyny avatar May 03 '22 23:05 dustyny