seamless_communication
seamless_communication copied to clipboard
Foundational Models for State-of-the-Art Speech and Text Translation
Foundational models is bad calling that is addressed in the article [Reflection on Foundation Models](https://hai.stanford.edu/news/reflections-foundation-models) by Stanford University: > Foundation models are neither “foundational” nor the foundations of AI. We...
So I wanted to deploy seamless-m4t-v2 and tested it on some german librivox files, with **clear speech and without noise**. Unfortunately the transcription fails by not transcribing anything except for...
So since you have to define the source language when you use m4t-v2, I wonder how you guys handle the problem of language identification? At the moment I use whisper...
Hi, I want to finetune the ASR on the custom dataset, so 2 issues have arisen: 1.How can I do the finetuning for ASR? Is it possible to make modifications...
hello I'm trying to fine-tune small model for ASR for custom Egyptian dataset How can I do it ? here's a data sample of my custom data, is it in...
Can we use an external LM rescoring model such as KenLM for the text decoder part of Seamless M4T for tasks such as ASR or S2T translation?
i have a bunch of private unlabelled speech corpuses for Indian language families, hence given that its an obvious choice that i would want to continually pre-train the w2v-BERT2.0 model...
Some languages do not support speech synthesis. How should I use my own data to train speech synthesis and support speech to speech translation?
Hi, The dataset link for the[ MuTox](https://github.com/facebookresearch/seamless_communication/blob/main/src/seamless_communication/cli/toxicity/mutox/README.md) dataset is not accessible as given in the ReadMe document. Kindly update the dataset link.
Is there a way to get the output logits, or somehow get word-level confidence scores for the predictions generated?