icefall
icefall copied to clipboard
Training streaming zipformer diverges at 700-1000 steps
I'm trying to train a streaming zipformer model with 2 custom datasets, first one with 160 hours of training data and second one with 37 hours training data. For the first train i use default parameters, however in 700 steps and so on, the loss increased and grad scale seems to be unstable. i used different base-lr 0.035, 0.025, but the issue is still there. I then lowered the base-lr to 0.015 and the first 1500 steps seems fine but then the same issue came up. Here's the graph:
My command:
! export PYTHONPATH=/content/icefall:$PYTHONPATH && \
cd /content/icefall/egs/librispeech/ASR && \
./zipformer/train.py \
--num-epochs 30 \
--start-epoch 2 \
--use-fp16 1 \
--exp-dir /content/drive/MyDrive/zipformer/exp \
--bpe-model /content/drive/MyDrive/AI/lang_bpe_500/bpe.model \
--causal 1 \
--save-every-n 2500 \
--base-lr 0.015 \
--full-libri 0 \
--mini-libri 0 \
--bucketing-sampler 0 \
--max-duration 300
It's also where the Parameter dominating tot_sumsq and the Parameters with most larger-than-usual grads warning came up everytime. I filtered my cuts to remove long and short utterance (removed shorter than 1 second and longer than 20 second). My max duration is 300 seconds. I used ZipSampler to combine the datasets. Can someone guide me on what's happening here? Should i lower my base-lr again?