FunASR
FunASR copied to clipboard
Paraformer Finetune收敛不一致问题
感谢阿里团队开源Funasr项目: 我们测试Paraformer在通用性上确实是好于业内一些其他厂商,值得肯定!!! 所以我们想用贵司开源的模型在Aishell-1 上进行Finetune(接入了CTC头,概率为0.3),我们发现一个很奇怪的现象,即train.py 脚本 “--dataset_type small” 时,CTC端时可以正常的收敛,最终CER可以到2.32%(CTC端结果);但是当“--dataset_type large” 时,CTC端无法收敛,模型训练精度很差,如下图:
期待大佬们的回复 Thx
I use aishell-1 data to finetune paraformer-large offline model with CTC, set dataset_type to small and large, the training log is as follows, you can refer to it
to finetune paraformer-large offline model with CTC, set dataset_type to small and l
感谢回复! 您这边是否使用了 aishell-1 数据生成 mvn?我这边使用原始的 paraformer-large offline model 的mvn进行训练模型
BTW,您这边方便发一下您这边训练模型的 config.yaml 文件配置吗?
使用原始的mvn 在使用 small 训练可以正常收敛,但是使用large 时则出现不收敛情况,使用3卡V100 进行finetune。
I use finetune.py recipe to finetune paraformer-large offline model, config.yaml you can refer to config.yaml and set ctc_weight to 0.3, and mvn file has not been modified
感谢 luo 大佬,我们测试下,有结果会在此issue 回复,Thx
训练至 54 轮, loss ctc 才开始出现下降现象,训练才开始正常,是不是学习率太低的原因?
You can refer to my configuration and quickly verify it.
okay, Thanks for your kindly reply.
I use finetune.py recipe to finetune paraformer-large offline model, config.yaml you can refer to config.yaml and set ctc_weight to 0.3, and mvn file has not been modified
=============== 通过您的模型配置方法,我们发现CTC头确实很快就可以收敛,但是最终收敛测试精度上CTC比decoder端差了接近10个点以上, 请问是什么问题?如下是训练日志,20轮以后出现了问题。
INFO: 17epoch results: [train] iter_time=0.044, forward_time=1.020, loss_ctc=5.371, loss_att=0.263, acc=0.905, loss_pre=0.035, loss=1.830, backward_time=0.706, optim_step_time=0.076, optim0_lr0=8.513e-04, train_time=7.967, time=41 minutes and 3.77 seconds, total_count=21085, gpu_max_cached_mem_GB=30.074, [valid] loss_ctc=4.719, cer_ctc=0.150, loss_att=0.205, pre_loss_att=nan, acc=0.917, cer=0.063, wer=0.281, loss_pre=0.035, loss=1.594, time=1 minute and 16.81 seconds, total_count=1190, gpu_max_cached_mem_GB=30.074 INFO: 18epoch results: [train] iter_time=0.039, forward_time=1.038, loss_ctc=5.795, loss_att=0.289, acc=0.893, loss_pre=0.042, loss=1.982, backward_time=0.705, optim_step_time=0.077, optim0_lr0=9.027e-04, train_time=7.976, time=40 minutes and 50.98 seconds, total_count=22315, gpu_max_cached_mem_GB=30.074, [valid] loss_ctc=4.879, cer_ctc=0.155, loss_att=0.239, pre_loss_att=nan, acc=0.908, cer=0.070, wer=0.302, loss_pre=0.042, loss=1.673, time=1 minute and 16.54 seconds, total_count=1260, gpu_max_cached_mem_GB=30.074 INFO: 19epoch results: [train] iter_time=0.046, forward_time=1.029, loss_ctc=5.312, loss_att=0.278, acc=0.898, loss_pre=0.039, loss=1.827, backward_time=0.706, optim_step_time=0.075, optim0_lr0=9.543e-04, train_time=7.974, time=41 minutes and 37.82 seconds, total_count=23569, gpu_max_cached_mem_GB=30.074, [valid] loss_ctc=4.814, cer_ctc=0.150, loss_att=0.226, pre_loss_att=nan, acc=0.914, cer=0.066, wer=0.289, loss_pre=0.035, loss=1.638, time=1 minute and 17.4 seconds, total_count=1330, gpu_max_cached_mem_GB=30.074 INFO: 20epoch results: [train] iter_time=0.040, forward_time=1.034, loss_ctc=6.042, loss_att=0.315, acc=0.879, loss_pre=0.051, loss=2.085, backward_time=0.693, optim_step_time=0.066, optim0_lr0=0.001, train_time=7.987, time=41 minutes and 34.12 seconds, total_count=24819, gpu_max_cached_mem_GB=30.074, [valid] loss_ctc=5.630, cer_ctc=0.172, loss_att=0.249, pre_loss_att=nan, acc=0.902, cer=0.073, wer=0.323, loss_pre=0.040, loss=1.903, time=1 minute and 15.98 seconds, total_count=1400, gpu_max_cached_mem_GB=30.074 INFO: 21epoch results: [train] iter_time=0.042, forward_time=1.058, loss_ctc=6.162, loss_att=0.324, acc=0.875, loss_pre=0.052, loss=2.128, backward_time=0.692, optim_step_time=0.066, optim0_lr0=0.001, train_time=7.944, time=40 minutes and 33.02 seconds, total_count=26045, gpu_max_cached_mem_GB=30.074, [valid] loss_ctc=5.668, cer_ctc=0.183, loss_att=0.270, pre_loss_att=nan, acc=0.891, cer=0.080, wer=0.342, loss_pre=0.049, loss=1.938, time=1 minute and 16.94 seconds, total_count=1470, gpu_max_cached_mem_GB=30.074