icefall
icefall copied to clipboard
LODR RNNLM rescoring requirements
I am trying to understand the requirements on the RNNLM for LODR rescoring
I am using something along the lines of Librispeech pruned_transducer_stateless3 recipe with
https://github.com/k2-fsa/icefall/tree/master/egs/ptb/LM as prototype for LM training (except 3 layers, 600 dim and tie-weights true.
I get the following error messages:
2022-12-09 03:13:25,663 INFO [decode1.py:1185] lm filename: 2gram.fst.txt
2022-12-09 03:13:25,796 INFO [decode1.py:1191] num states: 453
2022-12-09 03:13:26,397 INFO [model.py:69] Tying weights
2022-12-09 03:13:26,397 INFO [checkpoint.py:112] Loading checkpoint from ../ngLM/rnnlm-exp/epoch-0.pt
Traceback (most recent call last):
File "/mnt/dsk1/icefall/egs/ng/./pruned_transducer_stateless3/decode1.py", line 1259, in
It's not clear to me what this message means and how to fix this. Some guidance is appreciated.
It seems to me that the model architecture does not match. You are using the You might need to change hidden-dim
to 600.
Can you check that you use the same RNN LM model parameters for both train.py and decode.py?
Thanks. That resolved it.
I ran into a different issue this time. May I get some guidance here on how to resolve the issue? This doesn't happen on the very first pass into the code, but it does happen for the first file. The bigram LM seems to load fine though.
File "/mnt/dsk1/icefall/egs/my/pruned_transducer_stateless3/decode1.py", line 1259, in
Could you please post the full command you are using to invoke decode.py?
python3 -m pdb ./pruned_transducer_stateless3/decode1.py
--iter 480000
--avg 20
--simulate-streaming 1
--causal-convolution 1
--decode-chunk-size 16
--left-context 64
--exp-dir ./pruned_transducer_stateless3/exp
--max-duration 600
--beam 4.0
--max-contexts 8
--max-states 32
--decoding-method modified_beam_search_rnnlm_LODR
--rnn-lm-scale 0.4
--rnn-lm-exp-dir ../LM/rnnlm-exp
--rnn-lm-epoch 0
--rnn-lm-avg 1
--rnn-lm-num-layers 3
--rnn-lm-tie-weights 1
--tokens-ngram 2
--ngram-lm-scale -0.16 \
I also notice something unusual about the 2gram.fst.txt. It ends with 452 282 461 461 11.9809 452 0 500 0 5.33528 452 0.328656
but there is no state '0'. Transition to state "0" but state "0" is not defined. not even as a final state like 452 is.
but there is no state '0'. Transition to state "0" but state "0" is not defined. not even as a final state like 452 is.
452 0 500 0 5.33528
If 500
is the ID of the #0
, then I think state 0 corresponds to the backoff state.
assert current_ngram_score <= 0.0, ( AssertionError: (-inf, -inf)
Could you check that the default value 500
is also the ID of #0
in your tokens.txt
?
https://github.com/k2-fsa/icefall/blob/b25c234c51426d61552cdca819ab57fe712214c9/egs/librispeech/ASR/pruned_transducer_stateless3/decode.py#L460-L462
Yes backoff-id is 500. This is the end of tokens.txt - "" insertd by me so that github displays the text properly. "I 496" "P 497" "* 498" "[ 499" "#0 500" "#1 501"
I think there must be something wrong this your bigram. How was it generated?
Also, are you doing a cross-domain or intra-domain evaluation?
Below is the LM generation script I used. In pruned_transducer_stateless3 a bunch of data for the 2nd output is cross-domain, but the primary is intra-domain. Evaluation is intra-domain, but previously unseen data.
#!/usr/bin/env bash
lang_dir=data/lang_bpe_500
for ngram in 2 ; do
if [ ! -f $lang_dir/${ngram}gram.arpa ]; then
./shared/make_kn_lm.py
-ngram-order ${ngram}
-text $lang_dir/transcript_tokens.txt
-lm $lang_dir/${ngram}gram.arpa
fi
if [ ! -f $lang_dir/${ngram}gram.fst.txt ]; then
python3 -m kaldilm
--read-symbol-table="$lang_dir/tokens.txt"
--disambig-symbol='#0'
--max-order=${ngram}
$lang_dir/${ngram}gram.arpa > $lang_dir/${ngram}gram.fst.txt
fi
done
Nevermind, I figured out, in order to avoid inf the LODR ngram should have <unk>
and must have vocab of 500 tokens exactly
Hi @csukuangfj , I am also facing this issue, but havent been able to work it out and find a solution. Could you please let me know how should I solve it?
Error logs
2024-02-22 13:26:51,691 INFO [decode.py:834] Decoding started
2024-02-22 13:26:51,692 INFO [decode.py:840] Device: cuda:0
2024-02-22 13:26:51,696 INFO [decode.py:850] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.17.0.dev+git.230c8fcb.clean', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'c78407a-dirty', 'icefall-git-date': 'Fri Feb 16 16:38:45 2024', 'icefall-path': '/mnt/local/sangeet/workncode/k2-fsa/icefall', 'k2-path': '/mnt/users/sagarst/envs/k2-gpu/lib/python3.11/site-packages/k2/__init__.py', 'lhotse-path': '/mnt/local/sangeet/workncode/lhotse/lhotse/__init__.py', 'hostname': 'emlgpu04', 'IP address': '127.0.1.1'}, 'epoch': 30, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal/1200'), 'bpe_model': 'Deu16_icefall/sample_data/lang_bpe_500/bpe.model', 'lang_dir': PosixPath('Deu16_icefall/sample_data/lm'), 'decoding_method': 'modified_beam_search_LODR', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': -0.24, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': True, 'lm_type': 'rnn', 'lm_scale': 0.42, 'tokens_ngram': 2, 'backoff_id': 500, 'context_score': 2.0, 'context_file': '', 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': False, 'manifest_dir': PosixPath('Deu16_icefall/sample_data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 19, 'lm_avg': 2, 'lm_exp_dir': '/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/LM/my-rnnlm-exp/1800/', 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048, 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768, 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8, 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': PosixPath('zipformer/exp-causal/1200/modified_beam_search_LODR'), 'has_contexts': False, 'suffix': 'epoch-30-avg-7-chunk-16-left-context-128-modified_beam_search_LODR-beam-size-4-rnn-lm-scale-0.42-LODR-2gram-scale--0.24-use-averaged-model', 'blank_id': 0, 'unk_id': 3, 'vocab_size': 500}
2024-02-22 13:26:51,696 INFO [decode.py:852] About to create model
2024-02-22 13:26:52,409 INFO [decode.py:919] Calculating the averaged model over epoch range from 23 (excluded) to 30
2024-02-22 13:26:56,392 INFO [model.py:75] Tying weights
2024-02-22 13:26:56,392 INFO [lm_wrapper.py:180] averaging ['/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/LM/my-rnnlm-exp/1800//epoch-18.pt', '/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/LM/my-rnnlm-exp/1800//epoch-19.pt']
2024-02-22 13:26:58,886 INFO [decode.py:976] Loading token level lm: G_2_gram.fst.txt
2024-02-22 13:26:59,022 INFO [decode.py:982] num states: 12143
2024-02-22 13:26:59,027 INFO [decode.py:1018] Number of model parameters: 66110931
2024-02-22 13:26:59,027 INFO [asr_datamodule.py:409] About to get test cuts
Could not load symbol cublasGetSmCountTarget from libcublas.so.11. Error: /usr/local/cuda-11.2/lib64/libcublas.so.11: undefined symbol: cublasGetSmCountTarget
Traceback (most recent call last):
File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/./zipformer/decode.py", line 1051, in <module>
main()
File "/mnt/users/sagarst/envs/k2-gpu/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/./zipformer/decode.py", line 1028, in main
results_dict = decode_dataset(
^^^^^^^^^^^^^^^
File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/./zipformer/decode.py", line 680, in decode_dataset
hyps_dict = decode_one_batch(
^^^^^^^^^^^^^^^^^
File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/./zipformer/decode.py", line 535, in decode_one_batch
hyp_tokens = modified_beam_search_LODR(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/zipformer/beam_search.py", line 2623, in modified_beam_search_LODR
assert current_ngram_score <= 0.0, (
^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: (-inf, -inf)
I have made sure that my RNN-LM training arguments are same as arguments for decode.py with modified_beam_search_LODR
.
Also
wc -l Deu16_icefall/sample_data/lang_bpe_500/tokens.txt >> 502
I am not sure how to make it 500.
and head and tail of tokens.txt looks like this
$$ head
<blk> 0
<sos/eos> 1
<UNK> 2
<unk> 3
▁ 4
S 5
T 6
EN 7
E 8
N 9
$$ tail
GRÖßTE 492
▁PAKISTAN 493
▁SEPTEMBER 494
▁STREIFEN 495
▁SCHWARZ 496
▁KÜNFTIG 497
▁STUTTGART 498
Q 499
#0 500
#1 501
If anyone help me solve it with some clue.
Thank You
Xiaoyu, could you have a look?
I re-trained the RNN-LM and WER loks better. However they do not seem quite consistent.
WER with greedy search~ 9 WER with beam search~ 8.5
WER with modified beam search with Shallow Fusion and an external LM~ 8.6 WER with modified beam search with LODR (to counter ILM) and an external LM- ERROR WER with modified beam search with LM rescoring to re-rank the n-best hypotheses after beam search~ 8.8
@marcoyang1998 any clues why this could be happening? Also, any help on how could I fix the above error.
Thank You
Hello @marcoyang1998 , I was wondering if you had a chance to see the above error and point in some direction so that I can find out the reason and fix it.
Sorry for getting back so late, I will have a look at the error this week.
Sangeet Sagar @.***>于2024年3月26日 周二22:00写道:
Hello @marcoyang1998 https://github.com/marcoyang1998 , I was wondering if you had a chance to see the above error and point in some direction so that I can find out the reason and fix it.
— Reply to this email directly, view it on GitHub https://github.com/k2-fsa/icefall/issues/749#issuecomment-2020508925, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK6YBCPTTLCALULNEF3KZJDY2FWQFAVCNFSM6AAAAAASY3RWFGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRQGUYDQOJSGU . You are receiving this because you were mentioned.Message ID: @.***>
Hi @marcoyang1998 ,
Waiting for updated. Any clue should be fine for me to figure out what could be wrong.
Facing the same error
File /temp_ssd/icefall/egs/librispeech/ASR/zipformer/beam_search.py:2623, in modified_beam_search_LODR(model, encoder_out, encoder_out_lens, LODR_lm, LODR_lm_scale, LM, beam, context_graph)
2620 # calculate the score of the latest token
2621 current_ngram_score = state_cost.lm_score - hyp.state_cost.lm_score
-> 2623 assert current_ngram_score <= 0.0, (
2624 state_cost.lm_score,
2625 hyp.state_cost.lm_score,
2626 )
2627 # score = score + TDLM_score - LODR_score
2628 # LODR_LM_scale should be a negative number here
2629 hyp_log_prob += (
2630 lm_score[new_token] * lm_scale
2631 + LODR_lm_scale * current_ngram_score
2632 + context_score
2633 ) # add the lm score
AssertionError: (-inf, -inf)
Head of the tokens.txt
<blk> 0
<sos/eos> 1
<unk> 2
Tail of the tokens.txt
tail -4 tokens.txt
#0 500
#1 501
#2 502
#3 503
Any help regarding this?
I'll try to find someone to look into this. Basically we need to trace back where the infinity came from and why. That may require adding assert statements to catch the infinity earlier. We should also find or decide where an infinity is "allowed" according to the intended interfaces used here.
The latest code in the master is
https://github.com/k2-fsa/icefall/blob/2d64228efae6ebb85b68a956942374a4801548ae/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py#L2618-L2625
From your log
File /temp_ssd/icefall/egs/librispeech/ASR/zipformer/beam_search.py:2623, in modified_beam_search_LODR(model, encoder_out, encoder_out_lens, LODR_lm, LODR_lm_scale, LM, beam, context_graph)
2620 # calculate the score of the latest token
2621 current_ngram_score = state_cost.lm_score - hyp.state_cost.lm_score
-> 2623 assert current_ngram_score <= 0.0, (
2624 state_cost.lm_score,
2625 hyp.state_cost.lm_score,
Could you first try the latest master and see if the issue persists? @duhtapioca
I tried but ended up with the same error.
@duhtapioca Could you please run your code again and print out the value of new_token
when triggering this assertion?
Could you please run your code again and print out the value of new_token when triggering this assertion?
Yes, will try that and share the output soon.
@marcoyang1998
The output is now
************** The new token is - 98
************** The new token is - 94
************** The new token is - 308
************** The new token is - 280
************** The new token is - 95
************** The new token is - 60
************** The new token is - 233
************** The new token is - 19
************** The new token is - 23
************** The new token is - 103
************** The new token is - 207
************** The new token is - 8
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[6], line 1
----> 1 predict_modified_beamsearch_LODR(['/home/azureuser/users/shreya/hindi_test_Set/hindi_test_main/test_main_wavs/3aa183cc-dd52-4822-9f2d-9ccaebafbac2.wav'])
File /anaconda/envs/k2_icefall/lib/python3.11/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
Cell In[5], line 227, in predict_modified_beamsearch_LODR(batch)
217 context_graph.build(contexts)
219 hyp_tokens = modified_beam_search(
220 model=model,
221 encoder_out=encoder_out,
(...)
224 context_graph=context_graph,
225 )
--> 227 hyp_tokens = modified_beam_search_LODR(
228 model=model,
229 encoder_out=encoder_out,
230 encoder_out_lens=encoder_out_lens,
231 beam=params.beam_size,
232 LODR_lm=ngram_lm,
233 LODR_lm_scale=-0.24,
234 LM=LM,
235 context_graph=None,
236 )
237 print("Hyp tokens created")
239 for hyp in sp.decode(hyp_tokens):
File /temp_ssd/icefall/egs/librispeech/ASR/zipformer/beam_search.py:2623, in modified_beam_search_LODR(model, encoder_out, encoder_out_lens, LODR_lm, LODR_lm_scale, LM, beam, context_graph)
2621 current_ngram_score = state_cost.lm_score - hyp.state_cost.lm_score
2622 print("************** The new token is - "+ str(new_token))
-> 2623 assert current_ngram_score <= 0.0, (
2624 state_cost.lm_score,
2625 hyp.state_cost.lm_score,
2626 )
2627 # score = score + TDLM_score - LODR_score
2628 # LODR_LM_scale should be a negative number here
2629 hyp_log_prob += (
2630 lm_score[new_token] * lm_scale
2631 + LODR_lm_scale * current_ngram_score
2632 + context_score
2633 ) # add the lm score
AssertionError: (-inf, -inf)
tokens.txt generated by egs/librispeech/ASR/local/prepare_lang_bpe.py for reference.