seamless_communication
seamless_communication copied to clipboard
failed in asr task
I try to test asr task in cli, but failed, do I miss anything?
$m4t_predict --model seamlessM4T_medium 16k.wav asr eng
2023-08-23 16:17:41,203 INFO -- m4t_scripts.predict.predict: Running inference on the GPU.
Using the cached checkpoint of the model 'seamlessM4T_medium'. Set force=True to download again.
Using the cached tokenizer of the model 'seamlessM4T_medium'. Set force=True to download again.
Using the cached checkpoint of the model 'vocoder_36langs'. Set force=True to download again.
Traceback (most recent call last):
....
File "/home/kaisermac/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kaisermac/miniconda3/lib/python3.11/site-packages/fairseq2/nn/transformer/relative_attention.py", line 293, in forward
raise ValueError(
ValueError: The input sequence length must be less than or equal to the maximum sequence length (4096), but is 16272 instead.
@kaiser-ok from the error description, looks like the waveform you feed to the model exceeds the maximum sequence length once converted to log-mel filterbanks. Could you please try to run it with a shorter audio file and see if that fixes the problem?
@cbalioglu I got same error as well. It looks due to long audio file. Is it possible to support long audio in the future?
so, how long audio is supported? I tested with 1-minute 16kHz wav, tgt_lang "cmn", the result was very poor.