seamless_communication
seamless_communication copied to clipboard
ValueError: The input waveform must be two dimensional, but has 102528 dimension(s) instead.
Hello, I am getting the error below and I can't find a solution. Does anyone have an idea of what I should do? I asked ChatGPT, I tried making the input sound file Stereo, making it Mono, etc. but it still didn't work. Thanks in advance.
----@---- seamless_communication % m4t_predict input/speech.mp3 --task S2ST --tgt_lang FRA --output_path /Users/username/seamless_communication/output/compl.mp3
2024-09-07 01:34:47,221 INFO -- seamless_communication.cli.m4t.predict.predict: Running inference on device=device(type='cpu') with dtype=torch.float32.
Using the cached checkpoint of seamlessM4T_v2_large. Set force
to True
to download again.
Using the cached tokenizer of seamlessM4T_v2_large. Set force
to True
to download again.
Using the cached tokenizer of seamlessM4T_v2_large. Set force
to True
to download again.
Using the cached tokenizer of seamlessM4T_v2_large. Set force
to True
to download again.
Using the cached checkpoint of vocoder_v2. Set force
to True
to download again.
/opt/homebrew/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:134: FutureWarning: torch.nn.utils.weight_norm
is deprecated in favor of torch.nn.utils.parametrizations.weight_norm
.
WeightNorm.apply(module, name, dim)
2024-09-07 01:35:09,103 INFO -- seamless_communication.cli.m4t.predict.predict: text_generation_opts=SequenceGeneratorOptions(beam_size=5, soft_max_seq_len=(1, 200), hard_max_seq_len=1024, step_processor=None, unk_penalty=0.0, len_penalty=1.0)
2024-09-07 01:35:09,105 INFO -- seamless_communication.cli.m4t.predict.predict: unit_generation_opts=SequenceGeneratorOptions(beam_size=5, soft_max_seq_len=(25, 50), hard_max_seq_len=1024, step_processor=None, unk_penalty=0.0, len_penalty=1.0)
2024-09-07 01:35:09,105 INFO -- seamless_communication.cli.m4t.predict.predict: unit_generation_ngram_filtering=False
2024-09-07 01:35:09,141 WARNING -- seamless_communication.inference.translator: Transposing audio tensor from (bsz, seq_len) -> (seq_len, bsz).
Traceback (most recent call last):
File "/opt/homebrew/bin/m4t_predict", line 8, in