seamless_communication issues

Results of ASR are incomplete

3

My issue seems very similar to https://github.com/facebookresearch/seamless_communication/issues/83 , but I am using Translator Python API + ASR task. My input is 30 seconds long, and I get about half of...

ysapolovych

I tried running the T2TT model on a CPU to translate English (eng) to Chinese (cmn), but encountered many <unk> symbols

my python code： import torch import torchaudio from seamless_communication.models.inference import Translator translator = Translator("seamlessM4T_large", "vocoder_36langs", torch.device("cpu"), torch.float32) translated_text, _, _ = translator.predict(subtitle.text, "t2tt", 'cmn', src_lang='eng') and the output： Oh, they...

belliedmonkey

how to identify input audio language?

1

For Input audio --> output audio language before translating or converting anything

anilcs13m

ASR ever returns "- What?"

I tried to adapt this snippet of code https://huggingface.co/facebook/seamless-m4t-unity-small-s2t#inference ```python import torchaudio import torch import sounddevice as sd import numpy as np import soundfile as sf audio_input, _ = torchaudio.load("example.wav")...

Raigan5

Outputs too many <unk> symbols with Mandarin Chinese (cmn & cmn_Hant) and Cantonese (yue)

6

For example "Oh, Peter." translated to "\,彼得.", "Oh, my god" translated to "\,我的上帝" Almost all of the "Oh" are translated into **\**, making this project almost unusable for Chinese and...

tanshuai

Add Replicate demo and API

Great work on `seamless_communication` and impressive results! This pull request makes it possible to run `seamless_communication` demo on Replicate (https://replicate.com/cjwbw/seamless_communication) and via API (https://replicate.com/cjwbw/seamless_communication/api)

chenxwh

CLA Signed

SeamlessM4T_large Model Produces Gibberish Output in Colab

8

#### Description When running the `SeamlessM4T_large` model in a Colab notebook, the output becomes repetitive and gibberish. This issue is not present when using the same model in Hugging Face...

pratikshappai

Licensing and speech APIs (Dear Mark)

8

I just want to put it on the record here that achieving anything close to what this model provides is prohibitively expensive and prone to technical issues, for any startup....

bitnom

Low quality results compared to Huggingface spaces and Demo!

4

Hello everyone, I tried to test the reported results on Google Colab, but when I compared the results with [Huggingface space](https://huggingface.co/spaces/facebook/seamless_m4t) using the same inputs, it appeared that there is...

arash-aut

Install issue on ARM64 / embedded GPU

2

Have same installation issue on a ARM64 with NVIDIA GPU, 16 GB or 64 GB, CUDA 11.2. pip install --verbose --trusted-host fair-package-repo.s3-website-us-east-1.amazonaws.com --extra-index-url http://fair-package-repo.s3-website-us-east-1.amazonaws.com/fairseq2/whl/stable/pt2.0.1/cu118 fairseq2 --verbose Using pip 23.2.1 from...

rebotnix

seamless_communication
seamless_communication copied to clipboard

Metadata

Results of ASR are incomplete

I tried running the T2TT model on a CPU to translate English (eng) to Chinese (cmn), but encountered many <unk> symbols

how to identify input audio language?

ASR ever returns "- What?"

Outputs too many <unk> symbols with Mandarin Chinese (cmn & cmn_Hant) and Cantonese (yue)

Add Replicate demo and API

SeamlessM4T_large Model Produces Gibberish Output in Colab

Licensing and speech APIs (Dear Mark)

Low quality results compared to Huggingface spaces and Demo!

Install issue on ARM64 / embedded GPU

← Metadata

Owner

Metadata

seamless_communication seamless_communication copied to clipboard

Metadata

← Metadata

Owner

Metadata

seamless_communication
seamless_communication copied to clipboard