icefall Nan outputs from encoder

I am getting nan outputs from the encoder of pruned transducer streaming model. tensor([[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]], grad_fn=<PermuteBackward0>) I am running on mac cpu.Any suggestions?

Aug 22 '24 13:08 Manjunath-mlp

There should be some logs telling you how to do with it. Have you followed the logs?

Aug 22 '24 13:08 csukuangfj

I am using a pretrained model to decode.I am not sure about which logs you are talking about

Aug 23 '24 04:08 Manjunath-mlp

Would you mind posting all of the logs?

The info you give is toooo limited.

Aug 23 '24 07:08 csukuangfj

These are the args i used :

{'best_train_loss': float("inf"), 'best_valid_loss': float("inf"), 
'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50,
 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4,
 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release',
 'k2-with-cuda': False, 'k2-git-sha1': '5735fa707f6091856d13ccd230aced6e9e64f815', 
'k2-git-date': 'Thu Jul 25 09:16:03 2024', 'lhotse-version': '1.28.0.dev+git.4ca97dc.clean', 
'torch-version': '2.3.0', 'torch-cuda-available': False, 'torch-cuda-version': None, 
'python-version': '3.10', 'icefall-git-branch': 'master', 'icefall-git-sha1': '59529722-dirty',
 'icefall-git-date': 'Sat Aug 17 10:54:38 2024', 'icefall-path': '/Users/Manjunath/Downloads/sourcek2/icefall',
 'k2-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/k2-1.24.4.dev20240823+cpu.torch2.3.0-py3.10-macosx-11.1-arm64.egg/k2/__init__.py',
 'lhotse-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/lhotse/__init__.py', 
'hostname': '', 'IP address': ''}, 'epoch': 30, 'iter': 0, 'avg': 1, 
'use_averaged_model': False,
 'exp_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp', 
'bpe_model': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model', 
'lang_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500', 
'decoding_method': 'fast_beam_search', 'beam_size': 4,
 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 
'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False, 
'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 2, 'backoff_id': 500, 
'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 
'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 
'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 
'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 
'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 
'decode_chunk_len': 32, 'full_libri': True, 'mini_libri': False, 
'manifest_dir': '../data/fbank', 'max_duration': 600, 'bucketing_sampler': True, 
'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0,
 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True,
 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True,
 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 
'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7, 
'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048,
 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 
'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768,
 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8,
 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/fast_beam_search', 
'suffix': 'epoch-30-avg-1-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64', 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}

Aug 23 '24 07:08 Manjunath-mlp

Also, would you mind sharing the command you are using? And could you tell us what steps you have done?

More details are always helpful.

Aug 23 '24 07:08 csukuangfj

Sorry ,i just clicked enter before pasting all ,here are the code blocks i am using

These are the args i used : {'best_train_loss': float("inf"), 'best_valid_loss': float("inf"), 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': False, 'k2-git-sha1': '5735fa707f6091856d13ccd230aced6e9e64f815', 'k2-git-date': 'Thu Jul 25 09:16:03 2024', 'lhotse-version': '1.28.0.dev+git.4ca97dc.clean', 'torch-version': '2.3.0', 'torch-cuda-available': False, 'torch-cuda-version': None, 'python-version': '3.10', 'icefall-git-branch': 'master', 'icefall-git-sha1': '59529722-dirty', 'icefall-git-date': 'Sat Aug 17 10:54:38 2024', 'icefall-path': '/Users/Manjunath/Downloads/sourcek2/icefall', 'k2-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/k2-1.24.4.dev20240823+cpu.torch2.3.0-py3.10-macosx-11.1-arm64.egg/k2/init.py', 'lhotse-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/lhotse/init.py', 'hostname': '', 'IP address': ''}, 'epoch': 30, 'iter': 0, 'avg': 1, 'use_averaged_model': False, 'exp_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp', 'bpe_model': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model', 'lang_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500', 'decoding_method': 'fast_beam_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False, 'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 2, 'backoff_id': 500, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'full_libri': True, 'mini_libri': False, 'manifest_dir': '../data/fbank', 'max_duration': 600, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7, 'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048, 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768, 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8, 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/fast_beam_search', 'suffix': 'epoch-30-avg-1-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64', 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}

#initiated the model using above args model = get_transducer_model(args)

and i used librispeech cuts dataset

args1=Namespace(epoch=30, avg=1, use_averaged_model=True, exp_dir='../../../../../icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/', lang_dir='../../../../../icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/', decoding_method='fast_beam_search', iter=0, context_size=2, max_sym_per_frame=1, return_cuts=True, on_the_fly_feats=False, input_strategy='PrecomputedFeatures', max_duration=10, num_workers=2)

librispeech = LibriSpeechAsrDataModule(args1) test_clean_cuts = librispeech.test_clean_cuts_soft() test_other_cuts = librispeech.dev_other_cuts_soft()

test_clean_dl = librispeech.test_dataloaders(test_clean_cuts) test_other_dl = librispeech.test_dataloaders(test_other_cuts)

test_sets = ["test-clean", "test-other"] test_dl = [test_clean_dl, test_other_dl]

#Used first input to decode for i,j in enumerate(test_clean_dl): print(i,j) break

#i,j are 0 {'inputs': tensor([[[-1.4938e+01, -1.3318e+01, -1.3666e+01, ..., -9.2335e+00, -9.9011e+00, -1.0107e+01], [-1.4294e+01, -1.2946e+01, -1.2869e+01, ..., -9.3337e+00, -1.0197e+01, -1.0312e+01], [-1.5064e+01, -1.5173e+01, -1.5958e+01, ..., -9.8856e+00, -9.9233e+00, -1.0065e+01], ..., [-1.2570e+01, -1.2061e+01, -1.3426e+01, ..., 5.1158e+37, 9.4185e+37, 1.7210e+38], [-1.4552e+01, -1.3632e+01, -1.3024e+01, ..., 8.7674e+37, 1.6231e+38, 2.9824e+38], [-1.5214e+01, -1.5527e+01, -1.3573e+01, ..., 1.4966e+38, 2.7860e+38, inf]]]), 'supervisions': {'text': ['BY DEGREES ALL HIS HAPPINESS ALL HIS BRILLIANCY SUBSIDED INTO REGRET AND UNEASINESS SO THAT HIS LIMBS LOST THEIR POWER HIS ARMS HUNG HEAVILY BY HIS SIDES AND HIS HEAD DROOPED AS THOUGH HE WAS STUPEFIED'], 'sequence_idx': tensor([0], dtype=torch.int32), 'start_frame': tensor([0], dtype=torch.int32), 'num_frames': tensor([1608], dtype=torch.int32), 'cut': [MonoCut(id='7127-75946-0028-495', start=0, duration=16.075, channel=0, supervisions=[SupervisionSegment(id='7127-75946-0028', recording_id='7127-75946-0028', start=0.0, duration=16.075, channel=0, text='BY DEGREES ALL HIS HAPPINESS ALL HIS BRILLIANCY SUBSIDED INTO REGRET AND UNEASINESS SO THAT HIS LIMBS LOST THEIR POWER HIS ARMS HUNG HEAVILY BY HIS SIDES AND HIS HEAD DROOPED AS THOUGH HE WAS STUPEFIED', language='English', speaker='7127', gender=None, custom=None, alignment=None)], features=Features(type='kaldi-fbank', num_frames=1608, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0, duration=16.075, storage_type='lilcom_chunky', storage_path='../data/fbank/librispeech_feats_test-clean/feats-0.lca', storage_key='2337650,45819,45198,44901,10324', recording_id='None', channels=0), recording=Recording(id='7127-75946-0028', sources=[AudioSource(type='file', channels=[0], source='/grid/codes/icefall/egs/librispeech/ASR/download/LibriSpeech/test-clean/7127/75946/7127-75946-0028.flac')], sampling_rate=16000, num_samples=257200, duration=16.075, channel_ids=[0], transforms=None), custom={'dataloading_info': {'rank': 0, 'world_size': 1, 'worker_id': None}})]}}

feature=j["inputs"] supervisions = j["supervisions"] texts = j["supervisions"]["text"] feature_lens = supervisions["num_frames"] feature_lens += 30

import torch import math LOG_EPS = math.log(1e-10)

feature = torch.nn.functional.pad( feature, pad=(0, 0, 0, 30), value=LOG_EPS, ) encoder_out, encoder_out_lens = model.encoder(x=feature, x_lens=feature_lens)

Here for encoder_out i am getting nans

Aug 23 '24 07:08 Manjunath-mlp

Could you share the complete file?

You can upload your code file as an attachment in the comment.

Aug 23 '24 07:08 csukuangfj

will this works? fast_beam_search.txt

Aug 23 '24 07:08 Manjunath-mlp

Could you post a runnable PYTHON CODE FILE?

We need to know which script you are using.

Aug 23 '24 07:08 csukuangfj

By the way, I suggest that you follow the doc https://k2-fsa.github.io/icefall/model-export/export-model-state-dict.html to learn how to use pre-trained models.

Aug 23 '24 08:08 csukuangfj

thats the ipynb file i am using to run ,i am unable to attach py or ipynb file.I am trying to implement this https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py#L444 for stateless7 streaming model ,I am trying to see the outputs at each timestep.

Aug 23 '24 08:08 Manjunath-mlp

I think have loaded the model dict of pretrained model pretty much the same ,you guys have implemented.For model.decoder i am able to see the model is predicting numbers .I dont know why encoder is predicting nan

Aug 23 '24 08:08 Manjunath-mlp

icefall icefall copied to clipboard

Nan outputs from encoder

icefall
icefall copied to clipboard