audio I have some questions about RNNT loss.

I have some questions about RNNT loss.

Open girlsending0 opened this issue 1 year ago • 6 comments

hello I would like to ask you a question that may be somewhat trivial. The shape of logits of RNN T loss is Batch, max_seq_len, max_target_len+1, class. Why is max_target_len+1 here? Shouldn't the number of classes be +1 to the size of the total vocab? Because blank is included. I don't understand at all. Is there anyone who can help?

https://pytorch.org/audio/main/generated/torchaudio.functional.rnnt_loss.html

Feb 26 '24 11:02 girlsending0

max_target_len+1 is not the vocab size. They are two different things.

You can find my implementation at https://github.com/csukuangfj/optimized_transducer/blob/master/optimized_transducer/csrc/cpu.cc#L83

Feb 26 '24 11:02 csukuangfj

@csukuangfj Thank you.

I said that in a misleading way.

What I'm curious about is why target_length +1 needs to be entered as the RNNT loss's 3rd input. Looking at your code, I noticed that you wrote target length+1 because it includes a blank label.

Isn't the blank input already included in n_class? (When setting n_class, I think len(vocab)+1 should be set. Similar to CTC loss.)

I don't quite understand

Feb 26 '24 12:02 girlsending0

You need to differentiate between target length and number of classes.

The transcript of an utterance is converted to tokens. The target length is the number of tokens of the transcript. It is not number of classes. The possible value of a token is in the range [1, num_of_classes-1].

Feb 26 '24 12:02 csukuangfj

So the number of classes should be len(vocab)? I understand. I had misunderstood the mechanism of RNN-Transducer. Since model will start from a blank label, it should be target_length+1.

Feb 26 '24 13:02 girlsending0

Great to hear it resolves your issue.

Feb 26 '24 13:02 csukuangfj

@csukuangfj Thank you for your kindness.

Feb 26 '24 13:02 girlsending0

audio audio copied to clipboard

I have some questions about RNNT loss.

audio
audio copied to clipboard