RAVE Support for multi-channel audio

Hi!

I've added an option to train the network on a stereo material. Hypothesis is that extra channels on just first and last conv layers are enough because l/r channels are highly correlated.

Here are some examples, trained on 60 mins of drum loops for ~1.4m steps:

Original Encoded Random sampling from a z-space

Training speed is:

8it/sec for mono
6it/sec for stereo

That's on my 3090 with batch size 8.

The code is working with and without PQMF. Here is the same example encoded by a model which is trained for ~1.4m steps with --data-size 1: Encoded

Exporting code for rave and prior training is also working.

Note that I haven't modified export_prior/combine_models because it is not yet implemented as I understand. Also please note that the model is not backward compatible, it now expects and returns a 3-d tensor even for a mono audio.

Jan 26 '22 23:01 gnhdnb

Any chance you could share the code for inferring "Random sampling from a z-space" ? Thanks!

Jan 27 '22 20:01 jeanbrazeau

Any chance you could share the code for inferring "Random sampling from a z-space" ? Thanks!

Hey

Something like that:

import torch as t 
import pytorch_lightning as pl 
import librosa as l 
import soundfile as sf 
 
synth = t.jit.load(your model) 
z = t.randn(1, 8, 300) 
output = synth.decode(z).squeeze(0).squeeze(0).detach() 
sf.write("output.wav", output, 48000)

Jan 28 '22 14:01 gnhdnb

@caillonantoine hey! so what do you think? any chance to merge?

Feb 23 '22 21:02 gnhdnb

Hi, thank you for your work ! I will study your PR in the following days and will get back to you :)

Feb 24 '22 11:02 caillonantoine

I'm gonna close this PR as the codebase is about to change significantly. Sorry for the wait, and thanks for the contribution !

Oct 13 '22 13:10 caillonantoine

RAVE RAVE copied to clipboard

Support for multi-channel audio

RAVE
RAVE copied to clipboard