SpeechT5
SpeechT5 copied to clipboard
What is the time taken to converge for the hidden unit tokenizer?
I am currently training the hidden unit tokenizer to predict speech units from text token ids. Although the accuracy of the model continuously increases, I am unable to judge whether it will finally converge. I am currently at 31.2% accuracy after 3 days of training. Since this is essentially a FastSpeech model, I was expecting this to converge much faster. Please do let us know your training times, loss curves etc. Any information will be helpful!