seamless_communication
seamless_communication copied to clipboard
Results of ASR are incomplete
My issue seems very similar to https://github.com/facebookresearch/seamless_communication/issues/83 , but I am using Translator Python API + ASR task. My input is 30 seconds long, and I get about half of it transcribed:
from seamless_communication.models.inference import Translator
import torch
device = torch.device('cuda:0')
translator = Translator('seamlessM4T_medium',
vocoder_name_or_card='vocoder_36langs',
device=device,
dtype=torch.float32)
full_text, wav, out_sr = translator.predict(input='1694165663513.wav',
task_str='ASR',
tgt_lang='eng',
src_lang='eng',
sample_rate=16000,
ngram_filtering=True)
I wonder if params text_max_len_a
, text_max_len_b
, unit_max_len_a
, and unit_max_len_b
of predict
method somehow contribute to that (alas, they are undocumented). Playing with them, however, did nothing.
Yes. If a try 60 sec Audio, get 20 sec transcribing, If I send 20 seconds , get 10 sec. Transcribied. If send 10 seconds audio, get 5 seconds transcribed ?
Anything on this?
Thank you
meet the same question, someone konw this?