librispeech-alignments
librispeech-alignments copied to clipboard
How was the aligner configured?
I wanted to try and replicate these alignments, but it looks like the timestamps were different than yours. Were you using the default configuration or did you make any changes?
Thanks!
Which alignments did you check? I used default parameters for the textgrid ones, but I applied a cleaning script to get the txt ones. I don't have that script anymore unfortunately.
I checked both just to be sure. I noticed the discrepancy after trying to run my own cleaned alignments through synthesizer preprocessing script in your sv2tts repo. There's an assertion in the split_on_silences function that checks if the first and last words are silences and that's where I errored out.
I'm wondering if there were updates to the MFA that may have caused this.
Ah right, sorry I didn't remember that until you mentioned it. Yes, I normalized everything in such a way that a sentences ends and starts with a silence, even if it's a 0-duration one. It was just out of convenience, I can't really remember why.
Silences are represented as empty words, e.g. in the first sentence there is a silence from 0s to 0.49s and the word 'GO' is pronounced from 0.49s to 0.89s. Each sentence is guaranteed to start and end with a silence, even if its duration is 0, this is for parsing convenience.
Do you happen to recall (generally) how you normalized those sentences to determine those silences?
Hi @sjmelsom, I want to create alignments on VCTK dataset, and I have no idea how to use the MFA. Is it possible to share your work regarding the creation of alignments?
@stray128 I would recommend that you head over to the MFA docs. They provide a reasonable amount of material that will help you get started.
Thanks, @sjmelsom. Also, I have another question for you. did you train the synthesizer model on the alignments you generated? If so, Were the results good when you surpass the number of steps that @CorentinJ trained?