seamless_communication issues

On "foundational" models

Foundational models is bad calling that is addressed in the article [Reflection on Foundation Models](https://hai.stanford.edu/news/reflections-foundation-models) by Stanford University: > Foundation models are neither “foundational” nor the foundations of AI. We...

StEvUgnIn

Seamless-M4T-v2 Catastrophic transcription error on clear audio (german), but file works fine in whisper v2

So I wanted to deploy seamless-m4t-v2 and tested it on some german librivox files, with **clear speech and without noise**. Unfortunately the transcription fails by not transcribing anything except for...

asusdisciple

M4T-v2 Language Identification for Audio / Spoken Language

5

So since you have to define the source language when you use m4t-v2, I wonder how you guys handle the problem of language identification? At the moment I use whisper...

asusdisciple

Finetuning for ASR and the dataset preparation

3

Hi, I want to finetune the ASR on the custom dataset, so 2 issues have arisen： 1.How can I do the finetuning for ASR? Is it possible to make modifications...

Gxhappiness

ASR fine-tuning

4

hello I'm trying to fine-tune small model for ASR for custom Egyptian dataset How can I do it ? here's a data sample of my custom data, is it in...

h9-tect

LM Rescoring for Seamless text decoder

1

Can we use an external LM rescoring model such as KenLM for the text decoder part of Seamless M4T for tasks such as ASR or S2T translation?

Sameep-c

scripts to reproduce w2v-BERT 2.0 pretraining ?

2

i have a bunch of private unlabelled speech corpuses for Indian language families, hence given that its an obvious choice that i would want to continually pre-train the w2v-BERT2.0 model...

StephennFernandes

Some languages do not support speech synthesis

1

Some languages do not support speech synthesis. How should I use my own data to train speech synthesis and support speech to speech translation?

kzmaker

MuTox dataset not accessible

3

Hi, The dataset link for the[ MuTox](https://github.com/facebookresearch/seamless_communication/blob/main/src/seamless_communication/cli/toxicity/mutox/README.md) dataset is not accessible as given in the ReadMe document. Kindly update the dataset link.

BhashaBluff

Confidence scores for the predictions generated?

Is there a way to get the output logits, or somehow get word-level confidence scores for the predictions generated?

Awaisn25

seamless_communication
seamless_communication copied to clipboard

Metadata

On "foundational" models

Seamless-M4T-v2 Catastrophic transcription error on clear audio (german), but file works fine in whisper v2

M4T-v2 Language Identification for Audio / Spoken Language

Finetuning for ASR and the dataset preparation

ASR fine-tuning

LM Rescoring for Seamless text decoder

scripts to reproduce w2v-BERT 2.0 pretraining ?

Some languages do not support speech synthesis

MuTox dataset not accessible

Confidence scores for the predictions generated?

← Metadata

Owner

Metadata

seamless_communication seamless_communication copied to clipboard

Metadata

← Metadata

Owner

Metadata

seamless_communication
seamless_communication copied to clipboard