audio-data-pytorch icon indicating copy to clipboard operation
audio-data-pytorch copied to clipboard

How to create a text and audio dataset

Open AI-Guru opened this issue 2 years ago • 1 comments

Hi!

First and foremost: congratulations on this fine collection of repositories! I am slowly working my way through them and I am amazed by how easy and effective your work is.

I will soon start some work on conditional audio generation. What would be a good starting point for creating something like a WAVDataset that would yield audio and text? Would it be the best way to just extend WAVDataset?

Best, Tristan

AI-Guru avatar Apr 05 '23 18:04 AI-Guru

Hi @AI-Guru, thanks a lot!

A subclass of WAVDataset with extra text metadata would be a good starting option. I personally used a WebDataset (with the custom AudioWebDataset) which basically loads a set of tar files with numbered pairs of wav/json. WebDatasets work well with a lot of data, but it's a bit more involved to start with.

flavioschneider avatar Apr 06 '23 17:04 flavioschneider