tacotron
tacotron copied to clipboard
How ljspeech dataset was created?
As I understand ljspeech dataset is originally an audiobook, how it was prepared? i.e. what tools were used to cut it to single sentences? Is it possible to use this tools for more noise audio data, i.e. speech from youtube?
I don't know how exactly @keithito build the data, but essentially the transcript and voice have to be matched. The key to cut the long audio in to segmented sentences is using time alignment of each sentence and cut it accordingly in decent duration or word length.
There are many tools to do that (especially for English), one of them is https://github.com/readbeyond/aeneas. But make sure you have to provide the transcripts along with the voice.
try also Mozilla Voice Data Set, it is already cut in sentences