icefall
icefall copied to clipboard
Issues with long cutset during zipformer training
Hello All,
we are training a zip former model for about 3400 hours of Tamil data. This is in reference with https://github.com/k2-fsa/icefall/issues/1751
We have NVIDIA A6000 50GB GPU. Getting the below error:
2024-12-19 16:44:26,604 INFO [asr_datamodule.py:375] About to create dev dataloader
2024-12-19 16:44:26,604 INFO [train.py:1326] Sanity check -- see if any of the batches in epoch 1 would cause OOM.
/home/armuser/anaconda3/envs/k2/lib/python3.8/site-packages/lhotse/dataset/sampling/dynamic.py:342: UserWarning: We have exceeded the max_duration constraint during sampling but have only 1 cut. This is likely because max_duration was set to a very low value ~10s, or you're using a CutSet with very long cuts (e.g. 100s of seconds long).
warnings.warn(
2024-12-19 16:58:57,481 ERROR [train.py:1345] Your GPU ran out of memory with the current max_duration setting. We recommend decreasing max_duration and trying again.
Failing criterion: single_longest_cut (=162.26) ...
2024-12-19 16:58:57,482 INFO [train.py:1304] Saving batch to zipformer/exp/batch-6c307511-b2b9-437a-28df-6ec4ce4a2bbd.pt
2024-12-19 16:59:01,314 INFO [train.py:1310] features shape: torch.Size([7, 16226, 80])
2024-12-19 16:59:01,315 INFO [train.py:1314] num tokens: 817
Traceback (most recent call last):
File "./zipformer/train.py", line 1380, in
Training command: ./zipformer/train.py --world-size 1 --num-epochs 30 --start-batch 336000 --use-fp16 1 --exp-dir zipformer/exp --max-duration 100 Initially had kept the max duration as 150 The training had completed for 4 epochs.Then got the above issue.Loaded the batch.pt file and got as below: 'sequence_idx': tensor([0, 1, 2, 3, 4, 5, 6], dtype=torch.int32), 'start_frame': tensor([0, 0, 0, 0, 0, 0, 0], dtype=torch.int32), 'num_frames': tensor([16226, 2022, 1926, 1744, 1716, 1676, 1613], dtype=torch.int32), 'cut': [MonoCut(id='Regional-Tiruchirapalli-Tamil-1345-202039142853_sent_97', start=0.0, duration=162.26, channel=0, supervisions=[SupervisionSegment(id='Regional-Tiruchirapalli-Tamil-1345-202039142853_sent_97', recording_id='Regional-Tiruchirapalli-Tamil-1345-202039142853_sent_97', start=0.0, duration=162.26, channel=0, text='அவ்வையார் விருது தமிழ்நாட்டில் சமூகநலப் பணிகளை அரப்பணிப்புடன் செயலாற்றியதாக 2020ஆம் ஆண்டிற்கான அவ்வையார் விருதுக்கு தேர்வு செய்யப்பட்ட திருவண்ணாமலையைச் சேர்ந்த சமூக சேவகி திருமதி', language=None, speaker='Regional-Tiruchirapalli-Tamil-1345-202039142853_sent_97', gender=None, custom={'origin': 'giga'}, alignment=None)], features=Features(type='kaldifeat-fbank', num_frames=16226, num_features=80, frame_shift=0.01, sampling_rate=8000, start=0, duration=162.26, storage_type='lilcom_chunky', storage_path='/home/armuser/10TBHDD/CUDA_11.6/icefall/egs/tamil/ASR/data/fbank/train_split/tamil_feats_train_00032581.lca', storage_key='964876,45872,45111,44652,45255,45498,44806,45091,45317,44865,45016,44804,44720,44784,44749,45046,44983,44943,45297,44866,45335,45125,45507,44978,44909,44841,44914,44718,44569,45297,44670,45390,44619,20203', recording_id='Regional-Tiruchirapalli-Tamil-1345-202039142853_sent_97', channels=0), recording=Recording(id='Regional-Tiruchirapalli-Tamil-1345-202039142853_sent_97', sources=[AudioSource(type='file', channels=[0], source='/media/ASR_database/shruthilipi_data/tamil/newsonair_renamed /Regional-Tiruchirapalli-Tamil-1345-202039142853_sent_97.wav')], sampling_rate=8000, num_samples=1298080, duration=162.26, channel_ids=[0], transforms=None),
Kindly suggest how to go about this issue.
we have a function in train.py to remove long and short utterances, which is enabled by default. Please don't disable it.
@csukuangfj in train.py there is train_cuts = train_cuts.filter(remove_short_utt). I was not able to find any option for long utt.
@csukuangfj in train.py there is train_cuts = train_cuts.filter(remove_short_utt). I was not able to find any option for long utt.
which file.are you referring to?
Please recheck.
@csukuangfj I am using this file : https://github.com/k2-fsa/icefall/blob/master/egs/gigaspeech/ASR/zipformer/train.py
please refer to librispeech
https://github.com/k2-fsa/icefall/blob/ad966fb81d76c9b6780cac6844d9c4aa1782a46b/egs/librispeech/ASR/zipformer/train.py#L1377-L1385
@bsshruthi22
please read the comment in train.py carefully.
@csukuangfj ok. thanks for your suggestion. .Now the training has resumed. Hopefully it gets completed without any error.
@csukuangfj Is there anyway to retain audios which are greater than 20s or less than 1s by doing any modification to cuts so that it doesn't give error?