tacotron icon indicating copy to clipboard operation
tacotron copied to clipboard

How ljspeech dataset was created?

Open mrgloom opened this issue 6 years ago • 2 comments

As I understand ljspeech dataset is originally an audiobook, how it was prepared? i.e. what tools were used to cut it to single sentences? Is it possible to use this tools for more noise audio data, i.e. speech from youtube?

mrgloom avatar Jul 19 '19 09:07 mrgloom

I don't know how exactly @keithito build the data, but essentially the transcript and voice have to be matched. The key to cut the long audio in to segmented sentences is using time alignment of each sentence and cut it accordingly in decent duration or word length.

There are many tools to do that (especially for English), one of them is https://github.com/readbeyond/aeneas. But make sure you have to provide the transcripts along with the voice.

liberocks avatar Aug 13 '19 08:08 liberocks

try also Mozilla Voice Data Set, it is already cut in sentences

japita-se avatar Aug 22 '19 09:08 japita-se