audio-data-pytorch
audio-data-pytorch copied to clipboard
How to create a text and audio dataset
Hi!
First and foremost: congratulations on this fine collection of repositories! I am slowly working my way through them and I am amazed by how easy and effective your work is.
I will soon start some work on conditional audio generation. What would be a good starting point for creating something like a WAVDataset that would yield audio and text? Would it be the best way to just extend WAVDataset?
Best, Tristan
Hi @AI-Guru, thanks a lot!
A subclass of WAVDataset with extra text metadata would be a good starting option. I personally used a WebDataset (with the custom AudioWebDataset) which basically loads a set of tar files with numbered pairs of wav/json. WebDatasets work well with a lot of data, but it's a bit more involved to start with.