seamless_communication icon indicating copy to clipboard operation
seamless_communication copied to clipboard

Foundational Models for State-of-the-Art Speech and Text Translation

Results 226 seamless_communication issues
Sort by recently updated
recently updated
newest added

Does speech recognition support streaming reasoning? How can I change it?

Regarding the following sentence, I think the translation result is not as good as Google Translate. ``` Hi, it's been fans, welcome to another installment at Spring Tips. ``` The...

I try to finetune model for TTS. I have myself prepared train_manifest.json and eval_manifest.json files with coding instead of using the datasets scripts. My train_manifest.json should look like the same...

When I run the code below, there is a warning message. Is it an error? How can I handle it? ```python from transformers import SeamlessM4TTokenizer tokenizer = SeamlessM4TTokenizer.from_pretrained( "facebook/hf-seamless-m4t-medium", src_lang="eng"...

```python # T2ST input_text = "how do you do" src_lang = "eng" tgt_lang = "eng" path_to_save_audio = "./test.wav" translated_text, wav, sr = translator.predict(input_text, "t2st", tgt_lang, src_lang=src_lang, ngram_filtering=True) # print(wav.shape) torchaudio.save(path_to_save_audio,...

PIP version: `pip 23.2.1` Python version: `python 3.10` Error message after running `pip install fairseq2==0.1`: ``` Collecting fairseq2==0.1 Obtaining dependency information for fairseq2==0.1 from https://files.pythonhosted.org/packages/cd/27/46c14e28e8cb0aa602660ce64d4547a37f460d382e4fcf94f2a53d47e5b0/fairseq2-0.1.0-py3-none-any.whl.metadata Using cached fairseq2-0.1.0-py3-none-any.whl.metadata (1.2 kB)...

## Summary - added pyproject.toml file for building the library, following [PEP 518](https://peps.python.org/pep-0518/); - added poetry-based dependency management to lock down dependency versions for reproducibility; ## Tests - build successfully...

CLA Signed

When I use the `wet_lines` script to download and gather aligned text information from the metadata, there is something wrong. The error message is as below. So what should I...

I have noticed that when the input audio file exceeds approximately 30 seconds in duration, the resulting output file contains only the first 10 seconds or the last 10 seconds...

if it has how to use it. translated_text, wav, sr = translator.predict( input='/content/drive/MyDrive/GPI/WAV/1.wav', task_str='s2st', tgt_lang='spa', # target language src_lang='spa', # source language # If you specify this, it will improve...