spleeter
spleeter copied to clipboard
[Discussion] How does spleeter handle stereo files when training?
I'm creating a dataset for training my own model. At the moment my dataset is comprised of mono loops. I'm trying to figure out if I should be generating stereo but i don't know how Spleeter handles mono vs stereo. I've tried digging through the code but I'm a novice with Tensorflow and this is a complex project.
Does it train on a stereo wav or does it split the wav up into 2 mono wavs (Left & Right channels)? If it uses both channels at once are they in context of one another? If I have sounds panned hard left & right does that matter?
If I have 1,000 stereo sets does it treat it as 2,000 examples or just 1,000?
Much appreciated..
Hi @dustyny,
The pre-trained models of spleeter
were trained on stereo files. Both channels are fed synchrounously as input of the model. A trained model should be able to deal with panning variations if it was trained with such data (data augmentation may be needed to support it then). 1000 stereo snippets will be treated as 1000 samples.
It should be possible to train new models with mono signal using the n_channels option in the config file, though.
@romi1502 thank you for that answer, very helpful!. I have started training, my test run on 5k mono drum loops ended with these results.. TBH I don't know what they mean, what does good look like?
absolute_difference = 1.7624866, global_step = 200000, loss = 1.7624866, brass_perc_spectrogram = 0.3552358, kicks_spectrogram = 0.3987537, pitched_perc_spectrogram = 0.5570321, s snares_claps_spectrogram = 0.45146513
If the data is only being trained on one channel do you think that could improve or reduce accuracy? I'm training on 55k mono loops now. Hopefully that will give a good result 🤞.