RAVE
RAVE copied to clipboard
Support for multi-channel audio
Hi!
I've added an option to train the network on a stereo material. Hypothesis is that extra channels on just first and last conv layers are enough because l/r channels are highly correlated.
Here are some examples, trained on 60 mins of drum loops for ~1.4m steps:
Original Encoded Random sampling from a z-space
Training speed is:
- 8it/sec for mono
- 6it/sec for stereo
That's on my 3090 with batch size 8.
The code is working with and without PQMF. Here is the same example encoded by a model which is trained for ~1.4m steps with --data-size 1
:
Encoded
Exporting code for rave and prior training is also working.
Note that I haven't modified export_prior/combine_models because it is not yet implemented as I understand. Also please note that the model is not backward compatible, it now expects and returns a 3-d tensor even for a mono audio.
Any chance you could share the code for inferring "Random sampling from a z-space" ? Thanks!
Any chance you could share the code for inferring "Random sampling from a z-space" ? Thanks!
Hey
Something like that:
import torch as t
import pytorch_lightning as pl
import librosa as l
import soundfile as sf
synth = t.jit.load(your model)
z = t.randn(1, 8, 300)
output = synth.decode(z).squeeze(0).squeeze(0).detach()
sf.write("output.wav", output, 48000)
@caillonantoine hey! so what do you think? any chance to merge?
Hi, thank you for your work ! I will study your PR in the following days and will get back to you :)
I'm gonna close this PR as the codebase is about to change significantly. Sorry for the wait, and thanks for the contribution !