Montreal-Forced-Aligner which am is used in kaldi ?

which am is used in kaldi ?

Open joan126 opened this issue 3 years ago • 2 comments

Hi, I am using MFA for force alignment between phonenes and audio, I want to know nnet3 or chain model is used to train MFA from scarch? As I know than tdnn in nnet3 is better for alignment.

Apr 16 '21 08:04 joan126

Neither, it just uses the GMM-HMM pipeline for training through LDA+SAT. The experiments that we did with nnet2 several years had very slight benefits if at all over the GMM acoustic models that didn't justify the increase in complexity and train time in my mind. If you've found better performance for alignment with nnet3, I'd be curious in seeing that, but from what I understand the nnet training in Kaldi generally takes the alignments from the GMM model and doesn't optimize any further on alignment.

Apr 16 '21 17:04 mmcauliffe

Neither, it just uses the GMM-HMM pipeline for training through LDA+SAT. The experiments that we did with nnet2 several years had very slight benefits if at all over the GMM acoustic models that didn't justify the increase in complexity and train time in my mind. If you've found better performance for alignment with nnet3, I'd be curious in seeing that, but from what I understand the nnet training in Kaldi generally takes the alignments from the GMM model and doesn't optimize any further on alignment.

thanks for reply, I get it! I used 15 hours high quality tts dataset to train from scrach. however, alignment results is not accurate. I am wondering whether if it's dataset size is too small ? can your give some suggestion to improve alignment accuracy?

Apr 17 '21 04:04 joan126

Montreal-Forced-Aligner Montreal-Forced-Aligner copied to clipboard

which am is used in kaldi ?

Montreal-Forced-Aligner
Montreal-Forced-Aligner copied to clipboard