CosyVoice 关于3s极速复刻，prompt参考音频格式bug

想知道对于prompt音频格式有什么要求，输入的32khz采样率的音频报错 RuntimeError: Cannot load audio from file: ffprobe not found. Please install ffmpeg in your system to use non-WAV audio file formats and make sure ffprobe is in your PATH. 显示输入的文件是非音频文件但是输入采样率16khz和24khz的能正常生成，是音频采样率除了不低于16khz还有别的限制吗

Jul 10 '24 06:07 ZHUHF123

this is due to audio file format, it is not wav format, please first convert it to wav format. what audio format are you using?

Jul 10 '24 06:07 aluminumbox

this is due to audio file format, it is not wav format, please first convert it to wav format. what audio format are you using?

用的是.wav文件报错，这个.wav文件和别的不报错的.wav文件比他是32khz的采样率，所以我想问prompt音频格式支持的采样率范围是多少

Jul 10 '24 06:07 ZHUHF123

this is due to audio file format, it is not wav format, please first convert it to wav format. what audio format are you using?

用的是.wav文件报错，这个.wav文件和别的不报错的.wav文件比他是32khz的采样率，所以我想问prompt音频格式支持的采样率范围是多少

your file may end with .wav, but the log shows that it is not wav format. greater than 16khz is ok

Jul 10 '24 06:07 aluminumbox

this is due to audio file format, it is not wav format, please first convert it to wav format. what audio format are you using?

用的是.wav文件报错，这个.wav文件和别的不报错的.wav文件比他是32khz的采样率，所以我想问prompt音频格式支持的采样率范围是多少

your file may end with .wav, but the log shows that it is not wav format. greater than 16khz is ok

是能播放的.wav文件，我该怎么核实一下这个音频的log是否是wav format呢

Jul 10 '24 07:07 ZHUHF123

完整报错如下

RuntimeWarning: Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work warn("Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work", RuntimeWarning) Traceback (most recent call last): File "/home/lixiufeng/.conda/envs/cosyvoice/lib/python3.8/site-packages/gradio/processing_utils.py", line 544, in audio_from_file audio = AudioSegment.from_file(filename) File "/home/lixiufeng/.conda/envs/cosyvoice/lib/python3.8/site-packages/pydub/audio_segment.py", line 728, in from_file info = mediainfo_json(orig_file, read_ahead_limit=read_ahead_limit) File "/home/lixiufeng/.conda/envs/cosyvoice/lib/python3.8/site-packages/pydub/utils.py", line 274, in mediainfo_json res = Popen(command, stdin=stdin_parameter, stdout=PIPE, stderr=PIPE) File "/home/lixiufeng/.conda/envs/cosyvoice/lib/python3.8/subprocess.py", line 858, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/home/lixiufeng/.conda/envs/cosyvoice/lib/python3.8/subprocess.py", line 1720, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'ffprobe'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/lixiufeng/.conda/envs/cosyvoice/lib/python3.8/site-packages/gradio/queueing.py", line 521, in process_events response = await route_utils.call_process_api( File "/home/lixiufeng/.conda/envs/cosyvoice/lib/python3.8/site-packages/gradio/route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( File "/home/lixiufeng/.conda/envs/cosyvoice/lib/python3.8/site-packages/gradio/blocks.py", line 1941, in process_api inputs = await self.preprocess_data( File "/home/lixiufeng/.conda/envs/cosyvoice/lib/python3.8/site-packages/gradio/blocks.py", line 1655, in preprocess_data processed_input.append(block.preprocess(inputs_cached)) File "/home/lixiufeng/.conda/envs/cosyvoice/lib/python3.8/site-packages/gradio/components/audio.py", line 218, in preprocess File "/home/lixiufeng/.conda/envs/cosyvoice/lib/python3.8/site-packages/gradio/processing_utils.py", line 554, in audio_from_file raise RuntimeError(msg) from e RuntimeError: Cannot load audio from file: ffprobe not found. Please install ffmpeg in your system to use non-WAV audio file formats and make sure ffprobe is in your PATH.

Jul 10 '24 07:07 ZHUHF123

this is due to audio file format, it is not wav format, please first convert it to wav format. what audio format are you using?

用的是.wav文件报错，这个.wav文件和别的不报错的.wav文件比他是32khz的采样率，所以我想问prompt音频格式支持的采样率范围是多少

your file may end with .wav, but the log shows that it is not wav format. greater than 16khz is ok

是能播放的.wav文件，我该怎么核实一下这个音频的log是否是wav format呢

try torchaudio.load(wav) to see whether there is error

Jul 10 '24 15:07 aluminumbox

this is due to audio file format, it is not wav format, please first convert it to wav format. what audio format are you using?

用的是.wav文件报错，这个.wav文件和别的不报错的.wav文件比他是32khz的采样率，所以我想问prompt音频格式支持的采样率范围是多少

your file may end with .wav, but the log shows that it is not wav format. greater than 16khz is ok

是能播放的.wav文件，我该怎么核实一下这个音频的log是否是wav format呢

try torchaudio.load(wav) to see whether there is error

torchaudio.load都可以正常print出来，格式都是（tensor,采样率）

Jul 11 '24 01:07 ZHUHF123

apt install ffmpeg

Jul 12 '24 01:07 qxde01

遇到了同样的问题，大于16khz采样率，.wav格式音频，位深度是32，报错：FileNotFoundError: [Errno 2] No such file or directory: 'ffprobe'。日志似乎还打印了：UserWarning: Trying to convert audio automatically from float32 to 16-bit int format. warnings.warn(warning.format(data.dtype)) /home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/pydub/utils.py:198: RuntimeWarning: Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work

针对上面的错误，我安装了ffmpeg，但依旧报其它错误： /home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/gradio/processing_utils.py:738: UserWarning: Trying to convert audio automatically from float32 to 16-bit int format. warnings.warn(warning.format(data.dtype)) Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/gradio/blocks.py", line 2043, in process_api data = await self.handle_streaming_outputs( File "/home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/gradio/blocks.py", line 1870, in handle_streaming_outputs binary_data, output_data = await block.stream_output( File "/home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/gradio/components/audio.py", line 361, in stream_output value, duration = await self.covert_to_adts(binary_data) File "/home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/gradio/components/audio.py", line 330, in covert_to_adts return await anyio.to_thread.run_sync(Audio._convert_to_adts, data) File "/home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2470, in run_sync_in_worker_thread return await future File "/home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 967, in run result = context.run(func, *args) File "/home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/gradio/components/audio.py", line 321, in _convert_to_adts segment = AudioSegment.from_file(io.BytesIO(data)) File "/home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/pydub/audio_segment.py", line 728, in from_file info = mediainfo_json(orig_file, read_ahead_limit=read_ahead_limit) File "/home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/pydub/utils.py", line 279, in mediainfo_json info = json.loads(output) File "/home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/json/init.py", line 346, in loads return _default_decoder.decode(s) File "/home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/ubuntu/anaconda3/envs/cosyvoice/lib/python3.10/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Mar 18 '25 09:03 china-share

同样的问题

Sep 14 '25 08:09 poerlang