MelSpecVAE icon indicating copy to clipboard operation
MelSpecVAE copied to clipboard

Variational Autoencoder in the mel-spectrogram domain for one-shot audio synthesis

MelSpecVAE

Author: Moisés Horta Valenzuela, 2021

Website: moiseshorta.audio

Twitter: @hexorcismos

Español: Open In Colab

English: Open In Colab

MelSpecVAE is a Variational Autoencoder that can synthesize Mel-Spectrograms which can be inverted into raw audio waveform. Currently you can train it with any dataset of .wav audio at 44.1khz Sample Rate and 16bit bitdepth.

Listen to audio examples here: https://soundcloud.com/h-e-x-o-r-c-i-s-m-o-s/sets/melspecvae-variational

Features:

  • Interpolate through 2 different points in the latent space and synthesize the 'in between' sounds.
  • Generate short one-shot audio
  • Synthesize arbitrarily long audio samples by generating seeds and sample from the latent space. Noise types for generating Z-vectors are uniform, Perlin and fractal.

Credits:

  • VAE neural network architecture coded following 'The Sound of AI' Youtube tutorial series by Valerio Velardo
  • Some utility functions from Marco Passini's MelGAN-VC Jupyter Notebook.