Montreal-Forced-Aligner
Montreal-Forced-Aligner copied to clipboard
which am is used in kaldi ?
Hi, I am using MFA for force alignment between phonenes and audio, I want to know nnet3 or chain model is used to train MFA from scarch? As I know than tdnn in nnet3 is better for alignment.
Neither, it just uses the GMM-HMM pipeline for training through LDA+SAT. The experiments that we did with nnet2 several years had very slight benefits if at all over the GMM acoustic models that didn't justify the increase in complexity and train time in my mind. If you've found better performance for alignment with nnet3, I'd be curious in seeing that, but from what I understand the nnet training in Kaldi generally takes the alignments from the GMM model and doesn't optimize any further on alignment.
Neither, it just uses the GMM-HMM pipeline for training through LDA+SAT. The experiments that we did with nnet2 several years had very slight benefits if at all over the GMM acoustic models that didn't justify the increase in complexity and train time in my mind. If you've found better performance for alignment with nnet3, I'd be curious in seeing that, but from what I understand the nnet training in Kaldi generally takes the alignments from the GMM model and doesn't optimize any further on alignment.
thanks for reply, I get it! I used 15 hours high quality tts dataset to train from scrach. however, alignment results is not accurate. I am wondering whether if it's dataset size is too small ? can your give some suggestion to improve alignment accuracy?