seamless_communication icon indicating copy to clipboard operation
seamless_communication copied to clipboard

failed in asr task

Open kaiser-ok opened this issue 2 years ago • 3 comments

I try to test asr task in cli, but failed, do I miss anything?

$m4t_predict --model seamlessM4T_medium 16k.wav asr eng
2023-08-23 16:17:41,203 INFO -- m4t_scripts.predict.predict: Running inference on the GPU. Using the cached checkpoint of the model 'seamlessM4T_medium'. Set force=True to download again. Using the cached tokenizer of the model 'seamlessM4T_medium'. Set force=True to download again. Using the cached checkpoint of the model 'vocoder_36langs'. Set force=True to download again. Traceback (most recent call last): .... File "/home/kaisermac/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kaisermac/miniconda3/lib/python3.11/site-packages/fairseq2/nn/transformer/relative_attention.py", line 293, in forward raise ValueError( ValueError: The input sequence length must be less than or equal to the maximum sequence length (4096), but is 16272 instead.

kaiser-ok avatar Aug 23 '23 14:08 kaiser-ok

@kaiser-ok from the error description, looks like the waveform you feed to the model exceeds the maximum sequence length once converted to log-mel filterbanks. Could you please try to run it with a shorter audio file and see if that fixes the problem?

cbalioglu avatar Aug 23 '23 16:08 cbalioglu

@cbalioglu I got same error as well. It looks due to long audio file. Is it possible to support long audio in the future?

tmclouisluk avatar Aug 24 '23 02:08 tmclouisluk

so, how long audio is supported? I tested with 1-minute 16kHz wav, tgt_lang "cmn", the result was very poor.

eeewhe avatar Aug 24 '23 02:08 eeewhe