whisperX icon indicating copy to clipboard operation
whisperX copied to clipboard

Facebook releases SeamlessM4T (Multimodal + Multilingual)

Open Infinitay opened this issue 1 year ago • 1 comments

SeamlessM4T is a foundational speech/text translation and transcription model that overcomes the limitations of previous systems with state-of-the-art results.

image

Website: ai.meta.com/resources/models-and-libraries/seamless-communication Code: facebookresearch/seamless_communication Paper: ai.meta.com/research/publications/seamless-m4t Blog Post: ai.meta.com/blog/seamless-m4t


I know this model is for translations, but I wanted to share this with you to see if there is anything you can learn from what they do to improve whisperX. Although I don't know much, skimming through the paper it seems they already implement some of what is done with whisperX such as relying on VAD and w2v 2.0 ASR (section 3.4.2 in their paper)

Feel free to close this, I just wanted to bring it to your attention in case you haven't came across this yet.

Infinitay avatar Aug 23 '23 20:08 Infinitay