Montreal-Forced-Aligner icon indicating copy to clipboard operation
Montreal-Forced-Aligner copied to clipboard

which am is used in kaldi ?

Open joan126 opened this issue 3 years ago • 2 comments

Hi, I am using MFA for force alignment between phonenes and audio, I want to know nnet3 or chain model is used to train MFA from scarch? As I know than tdnn in nnet3 is better for alignment.

joan126 avatar Apr 16 '21 08:04 joan126

Neither, it just uses the GMM-HMM pipeline for training through LDA+SAT. The experiments that we did with nnet2 several years had very slight benefits if at all over the GMM acoustic models that didn't justify the increase in complexity and train time in my mind. If you've found better performance for alignment with nnet3, I'd be curious in seeing that, but from what I understand the nnet training in Kaldi generally takes the alignments from the GMM model and doesn't optimize any further on alignment.

mmcauliffe avatar Apr 16 '21 17:04 mmcauliffe

Neither, it just uses the GMM-HMM pipeline for training through LDA+SAT. The experiments that we did with nnet2 several years had very slight benefits if at all over the GMM acoustic models that didn't justify the increase in complexity and train time in my mind. If you've found better performance for alignment with nnet3, I'd be curious in seeing that, but from what I understand the nnet training in Kaldi generally takes the alignments from the GMM model and doesn't optimize any further on alignment.

thanks for reply, I get it! I used 15 hours high quality tts dataset to train from scrach. however, alignment results is not accurate. I am wondering whether if it's dataset size is too small ? can your give some suggestion to improve alignment accuracy?

joan126 avatar Apr 17 '21 04:04 joan126