UNet-VocalSeparation-Chainer
UNet-VocalSeparation-Chainer copied to clipboard
Train on stereo data
It would be nice to have the ability to train on stereo data. In terms of network structure, is it as simple as changing the number of input channels in this line to 2 and the number out output channels in this line to 2? Obviously training data patches would have to have 2 channels as well.
Hi, sorry for replying late. Sure, I think it is possible to make it deal with stereo audio, by simply doubling this model. Also I am considering dropping the downsampling process to obtain more high-resolution results, and implement it on the demonstration web-site:)
Nice! I have found it's more difficult to train when the model is doubled (at least in tf.keras). The error is way higher. But I haven't done much experimenting yet with optimizers/learning rates/other hyper parameters.