seamless_communication icon indicating copy to clipboard operation
seamless_communication copied to clipboard

LM Rescoring for Seamless text decoder

Open Sameep-c opened this issue 11 months ago • 1 comments

Can we use an external LM rescoring model such as KenLM for the text decoder part of Seamless M4T for tasks such as ASR or S2T translation?

Sameep-c avatar Feb 27 '24 06:02 Sameep-c

Of course we can! A challenging part would be to properly align the tokens from the language model and from Seamless. I am not sure there is code that you can apply out of the box for this, but it is certainly a solvable task.

But I think that LM rescoring with Seamless doesn't make as much sense as with CTC-based ASR models, because the Seamless text decoder is already an autoregressive transformer language model on its own.

avidale avatar Feb 27 '24 08:02 avidale