icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Hybrid autoregressive transducer

Open desh2608 opened this issue 2 years ago • 5 comments

I was wondering if there are any existing recipes for the HAT model. It is a straightforward change by modeling the blank distribution as a Bernoulli distribution, and was shown to be useful to integrate external LMs, among other things.

Has anyone tried it in icefall, especially with the pruned loss?

desh2608 avatar Sep 24 '23 20:09 desh2608

I was wondering if there are any existing recipes for the HAT model. It is a straightforward change by modeling the blank distribution as a Bernoulli distribution, and was shown to be useful to integrate external LMs, among other things.

Has anyone tried it in icefall, especially with the pruned loss?

We have not tried that. Would be great if you can add that.

csukuangfj avatar Sep 24 '23 23:09 csukuangfj

@csukuangfj Do you have advice on what would be a good evaluation setup for using HAT to integrate external LMs? For example, how did you evaluate the LODR methods?

desh2608 avatar Sep 27 '23 13:09 desh2608

For a POC, I was just training a model on LibriSpeech, and was planning to use an external RNNLM. But Dan pointed out that LibriSpeech may not be the best test-bed for these experiments.

desh2608 avatar Sep 27 '23 13:09 desh2608

@marcoyang1998

Could you have a look?

csukuangfj avatar Sep 27 '23 14:09 csukuangfj

You may try cross-domain evaluation scenarios, e.g. decoding the LibriSpeech model on the Gigaspeech test sets using an RNNLM trained on the Gigaspeech transcripts. I believe I tested LODR in this scenario and it yielded better results than using only shallow fusion.

marcoyang1998 avatar Sep 27 '23 14:09 marcoyang1998