thesis
thesis copied to clipboard
ETH Zürich MSc Thesis: Accelerating Neural Audio Synthesis
DDSP and NEWT papers don't mention it, but RAVE does: "We use dequantization, random crop and allpass filters with random coefficients as our data augmentation strategy."
Neural audio synthesis models are hard to compare automatically - a survey will be needed to show that the quality didn't decrease through our speedups
TorchScript's performance seemed not to improve when forcing it to use both vCPUs, and DeepSparse [explicitly chooses not to use both](https://github.com/neuralmagic/deepsparse/issues/459). What if we change the number of CPUs?
RAVE (and [Multiband MelGAN](https://arxiv.org/pdf/2005.05106.pdf) too) feeds the raw single-band waveform to the discriminator. Wouldn't it make sense to use multiband decomposition for the discriminator as well?