icefall Training streaming zipformer diverges at 700-1000 steps

Training streaming zipformer diverges at 700-1000 steps

Open spacetronics opened this issue 9 months ago • 0 comments

I'm trying to train a streaming zipformer model with 2 custom datasets, first one with 160 hours of training data and second one with 37 hours training data. For the first train i use default parameters, however in 700 steps and so on, the loss increased and grad scale seems to be unstable. i used different base-lr 0.035, 0.025, but the issue is still there. I then lowered the base-lr to 0.015 and the first 1500 steps seems fine but then the same issue came up. Here's the graph:

My command:

! export PYTHONPATH=/content/icefall:$PYTHONPATH && \
  cd /content/icefall/egs/librispeech/ASR && \
  ./zipformer/train.py \
  --num-epochs 30 \
  --start-epoch 2 \
  --use-fp16 1 \
  --exp-dir /content/drive/MyDrive/zipformer/exp \
  --bpe-model /content/drive/MyDrive/AI/lang_bpe_500/bpe.model \
  --causal 1 \
  --save-every-n 2500 \
  --base-lr 0.015 \
  --full-libri 0 \
  --mini-libri 0 \
  --bucketing-sampler 0 \
  --max-duration 300

It's also where the Parameter dominating tot_sumsq and the Parameters with most larger-than-usual grads warning came up everytime. I filtered my cuts to remove long and short utterance (removed shorter than 1 second and longer than 20 second). My max duration is 300 seconds. I used ZipSampler to combine the datasets. Can someone guide me on what's happening here? Should i lower my base-lr again?

Feb 26 '25 08:02 spacetronics

icefall icefall copied to clipboard

Training streaming zipformer diverges at 700-1000 steps

icefall
icefall copied to clipboard