CLAP
CLAP copied to clipboard
Add Data Augmentation
Prioritize fast, GPU-based audio augmentations that maintain the vocal content. I'd like to use nnAudio for the spectogram computation, so augmentations that use PyTorch modules are ideal.
https://github.com/asteroid-team/torch-audiomentations
https://github.com/adefossez/julius
Forgot to mention, we can also think about augmentation for the transcripts. It was mentioned that we could tag then with relevant metadata or run them through https://github.com/huggingface/torchMoji to tag with emotive emojis.