espnet Hello, how to add scores (lm, ctc, ngram) for each token in data.json?

Hello, how to add scores (lm, ctc, ngram) for each token in data.json?

Open bitszhang opened this issue 2 years ago • 3 comments

In beam search, each result has scores (ctc, lm, ngram), is this a cumulative probability? Because there is only one number. If I want to know the probability of each token, what should I do. Further explanation, for example, I have a sequence x1, x2, x3, x1 that generates a score_ 1. x1 and x2 generate a score_ 2, x1 and x2, x3 generate a score_ 3, but I want to know the scores for each of x1, x2, and x3. What should I do

Jun 08 '23 03:06 bitszhang

We don't provide such functions, but such information is stored in nbest_hyps in https://github.com/espnet/espnet/blob/master/espnet/nets/beam_search.py#L329-L337. Therefore, if you modify the output format (e.g., https://github.com/espnet/espnet/blob/master/espnet2/bin/asr_inference.py#L535), you can obtain such results.

Jun 08 '23 05:06 sw005320

Thank you for your help. I know ASR is an autoregressive model。After testing, it looks like a cumulative score. How can I calculate the score for each token

For example：

token_ids= [4232] ，scores= {'decoder': 0.0, 'ctc': 0.0, 'lm': 0.0, 'ngram': 0.0}
token_ids= [4232, 1027] ，scores= {'decoder': -0.044382162392139435, 'lm': -6.422976970672607, 'ctc': -3.593962173908949e-05, 'ngram': -2.867363929748535}
token_ids= [4232, 1027, 281] ，scores= {'decoder': -0.14182133972644806, 'lm': -6.4238996505737305, 'ctc': -0.00010113022290170193, 'ngram': -2.8736231327056885}

For example, in the first three rows of data, we can see that

When token=[4232], decoder=0.0, ctc=0,0, ngram=0.0, this is the initialization term
Token=[42321027], which should be the result of the first token=1027. encoder=-0.044382162392139435, ctc=-3.593962173908949e-05, ngram=-2.867363929748535
Token_ Ids=[4232, 1027, 281], what should I do to calculate the score when token=281

Jun 15 '23 05:06 bitszhang

I also have the same question, but it looks like the scores are merged in self.merge_scores

Apr 30 '24 17:04 chaufanglin

espnet espnet copied to clipboard

Hello, how to add scores (lm, ctc, ngram) for each token in data.json?

espnet
espnet copied to clipboard