icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Nan outputs from encoder

Open Manjunath-mlp opened this issue 1 year ago • 12 comments

I am getting nan outputs from the encoder of pruned transducer streaming model. tensor([[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]], grad_fn=<PermuteBackward0>) I am running on mac cpu.Any suggestions?

Manjunath-mlp avatar Aug 22 '24 13:08 Manjunath-mlp

There should be some logs telling you how to do with it. Have you followed the logs?

csukuangfj avatar Aug 22 '24 13:08 csukuangfj

I am using a pretrained model to decode.I am not sure about which logs you are talking about

Manjunath-mlp avatar Aug 23 '24 04:08 Manjunath-mlp

Would you mind posting all of the logs?

The info you give is toooo limited.

csukuangfj avatar Aug 23 '24 07:08 csukuangfj

These are the args i used :

{'best_train_loss': float("inf"), 'best_valid_loss': float("inf"), 
'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50,
 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4,
 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release',
 'k2-with-cuda': False, 'k2-git-sha1': '5735fa707f6091856d13ccd230aced6e9e64f815', 
'k2-git-date': 'Thu Jul 25 09:16:03 2024', 'lhotse-version': '1.28.0.dev+git.4ca97dc.clean', 
'torch-version': '2.3.0', 'torch-cuda-available': False, 'torch-cuda-version': None, 
'python-version': '3.10', 'icefall-git-branch': 'master', 'icefall-git-sha1': '59529722-dirty',
 'icefall-git-date': 'Sat Aug 17 10:54:38 2024', 'icefall-path': '/Users/Manjunath/Downloads/sourcek2/icefall',
 'k2-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/k2-1.24.4.dev20240823+cpu.torch2.3.0-py3.10-macosx-11.1-arm64.egg/k2/__init__.py',
 'lhotse-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/lhotse/__init__.py', 
'hostname': '', 'IP address': ''}, 'epoch': 30, 'iter': 0, 'avg': 1, 
'use_averaged_model': False,
 'exp_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp', 
'bpe_model': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model', 
'lang_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500', 
'decoding_method': 'fast_beam_search', 'beam_size': 4,
 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 
'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False, 
'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 2, 'backoff_id': 500, 
'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 
'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 
'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 
'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 
'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 
'decode_chunk_len': 32, 'full_libri': True, 'mini_libri': False, 
'manifest_dir': '../data/fbank', 'max_duration': 600, 'bucketing_sampler': True, 
'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0,
 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True,
 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True,
 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 
'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7, 
'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048,
 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 
'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768,
 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8,
 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/fast_beam_search', 
'suffix': 'epoch-30-avg-1-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64', 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}

Manjunath-mlp avatar Aug 23 '24 07:08 Manjunath-mlp

Also, would you mind sharing the command you are using? And could you tell us what steps you have done?

More details are always helpful.

csukuangfj avatar Aug 23 '24 07:08 csukuangfj

Sorry ,i just clicked enter before pasting all ,here are the code blocks i am using

These are the args i used : {'best_train_loss': float("inf"), 'best_valid_loss': float("inf"), 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': False, 'k2-git-sha1': '5735fa707f6091856d13ccd230aced6e9e64f815', 'k2-git-date': 'Thu Jul 25 09:16:03 2024', 'lhotse-version': '1.28.0.dev+git.4ca97dc.clean', 'torch-version': '2.3.0', 'torch-cuda-available': False, 'torch-cuda-version': None, 'python-version': '3.10', 'icefall-git-branch': 'master', 'icefall-git-sha1': '59529722-dirty', 'icefall-git-date': 'Sat Aug 17 10:54:38 2024', 'icefall-path': '/Users/Manjunath/Downloads/sourcek2/icefall', 'k2-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/k2-1.24.4.dev20240823+cpu.torch2.3.0-py3.10-macosx-11.1-arm64.egg/k2/init.py', 'lhotse-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/lhotse/init.py', 'hostname': '', 'IP address': ''}, 'epoch': 30, 'iter': 0, 'avg': 1, 'use_averaged_model': False, 'exp_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp', 'bpe_model': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model', 'lang_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500', 'decoding_method': 'fast_beam_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False, 'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 2, 'backoff_id': 500, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'full_libri': True, 'mini_libri': False, 'manifest_dir': '../data/fbank', 'max_duration': 600, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7, 'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048, 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768, 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8, 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/fast_beam_search', 'suffix': 'epoch-30-avg-1-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64', 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}

#initiated the model using above args model = get_transducer_model(args)

and i used librispeech cuts dataset

args1=Namespace(epoch=30, avg=1, use_averaged_model=True, exp_dir='../../../../../icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/', lang_dir='../../../../../icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/', decoding_method='fast_beam_search', iter=0, context_size=2, max_sym_per_frame=1, return_cuts=True, on_the_fly_feats=False, input_strategy='PrecomputedFeatures', max_duration=10, num_workers=2)

librispeech = LibriSpeechAsrDataModule(args1) test_clean_cuts = librispeech.test_clean_cuts_soft() test_other_cuts = librispeech.dev_other_cuts_soft()

test_clean_dl = librispeech.test_dataloaders(test_clean_cuts) test_other_dl = librispeech.test_dataloaders(test_other_cuts)

test_sets = ["test-clean", "test-other"] test_dl = [test_clean_dl, test_other_dl]

#Used first input to decode for i,j in enumerate(test_clean_dl): print(i,j) break

#i,j are 0 {'inputs': tensor([[[-1.4938e+01, -1.3318e+01, -1.3666e+01, ..., -9.2335e+00, -9.9011e+00, -1.0107e+01], [-1.4294e+01, -1.2946e+01, -1.2869e+01, ..., -9.3337e+00, -1.0197e+01, -1.0312e+01], [-1.5064e+01, -1.5173e+01, -1.5958e+01, ..., -9.8856e+00, -9.9233e+00, -1.0065e+01], ..., [-1.2570e+01, -1.2061e+01, -1.3426e+01, ..., 5.1158e+37, 9.4185e+37, 1.7210e+38], [-1.4552e+01, -1.3632e+01, -1.3024e+01, ..., 8.7674e+37, 1.6231e+38, 2.9824e+38], [-1.5214e+01, -1.5527e+01, -1.3573e+01, ..., 1.4966e+38, 2.7860e+38, inf]]]), 'supervisions': {'text': ['BY DEGREES ALL HIS HAPPINESS ALL HIS BRILLIANCY SUBSIDED INTO REGRET AND UNEASINESS SO THAT HIS LIMBS LOST THEIR POWER HIS ARMS HUNG HEAVILY BY HIS SIDES AND HIS HEAD DROOPED AS THOUGH HE WAS STUPEFIED'], 'sequence_idx': tensor([0], dtype=torch.int32), 'start_frame': tensor([0], dtype=torch.int32), 'num_frames': tensor([1608], dtype=torch.int32), 'cut': [MonoCut(id='7127-75946-0028-495', start=0, duration=16.075, channel=0, supervisions=[SupervisionSegment(id='7127-75946-0028', recording_id='7127-75946-0028', start=0.0, duration=16.075, channel=0, text='BY DEGREES ALL HIS HAPPINESS ALL HIS BRILLIANCY SUBSIDED INTO REGRET AND UNEASINESS SO THAT HIS LIMBS LOST THEIR POWER HIS ARMS HUNG HEAVILY BY HIS SIDES AND HIS HEAD DROOPED AS THOUGH HE WAS STUPEFIED', language='English', speaker='7127', gender=None, custom=None, alignment=None)], features=Features(type='kaldi-fbank', num_frames=1608, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0, duration=16.075, storage_type='lilcom_chunky', storage_path='../data/fbank/librispeech_feats_test-clean/feats-0.lca', storage_key='2337650,45819,45198,44901,10324', recording_id='None', channels=0), recording=Recording(id='7127-75946-0028', sources=[AudioSource(type='file', channels=[0], source='/grid/codes/icefall/egs/librispeech/ASR/download/LibriSpeech/test-clean/7127/75946/7127-75946-0028.flac')], sampling_rate=16000, num_samples=257200, duration=16.075, channel_ids=[0], transforms=None), custom={'dataloading_info': {'rank': 0, 'world_size': 1, 'worker_id': None}})]}}

feature=j["inputs"] supervisions = j["supervisions"] texts = j["supervisions"]["text"] feature_lens = supervisions["num_frames"] feature_lens += 30

import torch import math LOG_EPS = math.log(1e-10)

feature = torch.nn.functional.pad( feature, pad=(0, 0, 0, 30), value=LOG_EPS, ) encoder_out, encoder_out_lens = model.encoder(x=feature, x_lens=feature_lens)

Here for encoder_out i am getting nans

Manjunath-mlp avatar Aug 23 '24 07:08 Manjunath-mlp

Could you share the complete file?

You can upload your code file as an attachment in the comment.

csukuangfj avatar Aug 23 '24 07:08 csukuangfj

will this works? fast_beam_search.txt

Manjunath-mlp avatar Aug 23 '24 07:08 Manjunath-mlp

Could you post a runnable PYTHON CODE FILE?

We need to know which script you are using.

csukuangfj avatar Aug 23 '24 07:08 csukuangfj

By the way, I suggest that you follow the doc https://k2-fsa.github.io/icefall/model-export/export-model-state-dict.html to learn how to use pre-trained models.

csukuangfj avatar Aug 23 '24 08:08 csukuangfj

thats the ipynb file i am using to run ,i am unable to attach py or ipynb file.I am trying to implement this https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py#L444 for stateless7 streaming model ,I am trying to see the outputs at each timestep.

Manjunath-mlp avatar Aug 23 '24 08:08 Manjunath-mlp

I think have loaded the model dict of pretrained model pretty much the same ,you guys have implemented.For model.decoder i am able to see the model is predicting numbers .I dont know why encoder is predicting nan

Manjunath-mlp avatar Aug 23 '24 08:08 Manjunath-mlp