ChatTTS & CosyVoice2-0.5B 都碰到一样的错 Couldn't allocate AVFormatContext
System Info / 系統信息
- Ubuntu 22
- NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0
- Conda 25.7.0
- Python 3.11
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- [ ] docker / docker
- [x] pip install / 通过 pip install 安装
- [ ] installation from source / 从源码安装
Version info / 版本信息
Version: 1.11.0.post1
The command used to start Xinference / 用以启动 xinference 的命令
HF_ENDPOINT=https://hf-mirror.com xinference-local --host 0.0.0.0
Reproduction / 复现过程
- 在干净的conda环境里只安装 audio 包
pip install xinference[audio]启动 - 在UI界面启动 ChatTTS 或者 CosyVoice2-0.5B 然后launch UI,做简单的tts测试
- 这两个模型都报了相同的错 Couldn't allocate AVFormatContext. The destination file is <_io.BytesIO object at 0x......>, check the desired extension? Invalid argument
- 查到 https://github.com/xorbitsai/inference/issues/2739 说的
conda install -c conda-forge "ffmpeg<7"已执行,还是一样报错 - 目前我只有F5-TTS跑起来了
Expected behavior / 期待表现
ChatTTS 或者 CosyVoice2-0.5B 能正常运行
跑了下 cosyvoice 没有问题。
你这个错误看上去像是 ffmpeg 导致的。
我是按你建议的 conda install -c conda-forge "ffmpeg<7" 请教一下 应该用哪个版本好呢?
@qinxuye 我租了一台干净的机器,安装重测,还是跑不起来, cosyvoice 报错, 所以你到底为什么能跑起来,因为我已经在两台全新的机器试验了,一样的错
File "/home/vllm/miniconda3/envs/vllm_env/lib/python3.11/site-packages/torchcodec/_core/ops.py", line 69, in load_torchcodec_shared_libraries
raise RuntimeError(
^^^^^^^^^^^^^^^^^
RuntimeError: [address=0.0.0.0:36463, pid=8918] Could not load libtorchcodec. Likely causes:
1. FFmpeg is not properly installed in your environment. We support
versions 4, 5, 6 and 7.
2. The PyTorch version (2.9.0+cu128) is not compatible with
this version of TorchCodec. Refer to the version compatibility
table:
https://github.com/pytorch/torchcodec?tab=readme-ov-file#installing-torchcodec.
3. Another runtime dependency; see exceptions below.
The following exceptions were raised as we tried to load libtorchcodec:
[start of libtorchcodec loading traceback]
这台新机器上 我按照 https://github.com/meta-pytorch/torchcodec?tab=readme-ov-file#installing-torchcodec 说明,conda install "ffmpeg<8"
(vllm_env) vllm@iZ2zedd5pe69teumom3tcoZ:~/.xinference/logs/local_1761371197520$ ffmpeg -version
ffmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 11.2.0 (Anaconda gcc)
configuration: --prefix=/home/vllm/miniconda3/envs/vllm_env --cc=/home/task_175975072617190/conda-bld/ffmpeg_1759751615537/_build_env/bin/x86_64-conda-linux-gnu-cc --ar=/home/task_175975072617190/conda-bld/ffmpeg_1759751615537/_build_env/bin/x86_64-conda-linux-gnu-ar --nm=/home/task_175975072617190/conda-bld/ffmpeg_1759751615537/_build_env/bin/x86_64-conda-linux-gnu-nm --ranlib=/home/task_175975072617190/conda-bld/ffmpeg_1759751615537/_build_env/bin/x86_64-conda-linux-gnu-ranlib --strip=/home/task_175975072617190/conda-bld/ffmpeg_1759751615537/_build_env/bin/x86_64-conda-linux-gnu-strip --disable-doc --enable-swresample --enable-swscale --enable-openssl --enable-libxml2 --enable-libtheora --enable-demuxer=dash --enable-postproc --enable-hardcoded-tables --enable-libfreetype --enable-libharfbuzz --enable-libfontconfig --enable-libdav1d --enable-zlib --enable-libaom --enable-pic --enable-shared --disable-static --disable-gpl --enable-version3 --disable-sdl2 --enable-libopenh264 --enable-libopus --enable-libmp3lame --enable-libopenjpeg --enable-libvorbis --enable-pthreads --enable-libtesseract --enable-libvpx
libavutil 58. 29.100 / 58. 29.100
libavcodec 60. 31.102 / 60. 31.102
libavformat 60. 16.100 / 60. 16.100
libavdevice 60. 3.100 / 60. 3.100
libavfilter 9. 12.100 / 9. 12.100
libswscale 7. 5.100 / 7. 5.100
libswresample 4. 12.100 / 4. 12.100
最后还是报错 RuntimeError: Couldn't allocate AVFormatContext. The destination file is <_io.BytesIO object at 0x7f3028567f10>, check the desired extension? Invalid argument
我终于找到原因了! 如果直接 pip install xinference[audio] torch 和 torchcodec 都是最新版本
torch 2.9.0
torch-complex 0.4.4
torchaudio 2.9.0
torchcodec 0.8.0
xinference 目前版本应该是没法在这些版本下工作的, 我尝试想改 ..model/audio/utils.py 没有成功, 所以干脆降级
uv pip install torch==2.8.0 torchaudio==2.8.0 torchvision==0.23.0 torchcodec==0.7.0
就可以了
好的,最新版本我们看下什么原因。
更正一下 torchvision==0.23.0 是视频用不到。
我觉得问题是 torchaudio.save https://github.com/pytorch/audio/releases/tag/v2.9.0
torchaudio.load() and torchaudio.save() still exist, but their underlying implementation now relies on TorchCodec.
所以我尝试想改 ./xinference/model/audio/utils.py
def audio_to_bytes(response_format: str, sample_rate: int, tensor: "torch.Tensor"):
import torchaudio
response_pcm = response_format.lower() == "pcm"
with io.BytesIO() as out:
if response_pcm:
logger.info(f"PCM output, num_channels: 1, sample_rate: {sample_rate}")
torchaudio.save(out, tensor, sample_rate, format="wav", encoding="PCM_S")
# http://soundfile.sapp.org/doc/WaveFormat
return _extract_pcm_from_wav_bytes(out.getvalue())
else:
torchaudio.save(out, tensor, sample_rate, format=response_format)
return out.getvalue()
另外有个建议 这段写得太随意, 我之前一个问题是因为 不用 pynini 2.6 造成 python 3.12不能工作 这次是 torchaudio 不写和 torch 版本造成问题
audio =
funasr==1.2.7
omegaconf~=2.3.0
nemo_text_processing<1.1.0; sys_platform == 'linux' # 1.1.0 requires pynini==2.1.6.post1
WeTextProcessing<1.0.4; sys_platform == 'linux' # 1.0.4 requires pynini==2.1.6
librosa
xxhash
torchaudio
ChatTTS>=0.2.1
tiktoken # For CosyVoice, openai-whisper
torch>=2.0.0 # For CosyVoice, matcha
在没有支持 torchaudio2.9之前,最好这样改下
torch>=2.0.0,<2.9.0 # Pin to <2.9.0 to avoid BytesIO issues with torchcodec
torchaudio>=2.0.0,<2.9.0 # Pin to <2.9.0 to maintain compatibility
torchcodec>=0.6.0,<0.8.0 # Compatible with torch 2.8
@qinxuye torch==2.9.0 torchaudio==2.9.0 我尝试了一下这么改好像可以
def audio_to_bytes(response_format: str, sample_rate: int, tensor: "torch.Tensor"):
import soundfile as sf
response_pcm = response_format.lower() == "pcm"
# Convert tensor to numpy and transpose to [time, channel] for soundfile
audio_np = tensor.cpu().numpy().T if tensor.ndim == 2 else tensor.cpu().numpy()
with io.BytesIO() as out:
if response_pcm:
logger.info(f"PCM output, num_channels: 1, sample_rate: {sample_rate}")
sf.write(out, audio_np, sample_rate, format="WAV", subtype="PCM_16")
return _extract_pcm_from_wav_bytes(out.getvalue())
else:
sf.write(out, audio_np, sample_rate, format=response_format.upper())
return out.getvalue()
就是把 torchaudio.save() 换成 sf.write()
我写了一个测试代码,通过输出警告,得出这个改动
(test_torch29) vllm@VM-3-219-ubuntu:~/f5-tts-test$ python test_audio_save.py
Testing torchaudio.save() with BytesIO...
torch version: 2.8.0+cu128
torchaudio version: 2.8.0+cu128
/home/vllm/miniconda3/envs/test_torch29/lib/python3.11/site-packages/torchaudio/_backend/utils.py:337:
UserWarning: In 2.9, this function's implementation will be changed to use torchaudio.save_with_torchcodec` under the hood.
Some parameters like format, encoding, bits_per_sample, buffer_size, and ``backend`` will be ignored. We recommend that you port your code to rely directly on TorchCodec's encoder instead:
https://docs.pytorch.org/torchcodec/stable/generated/torchcodec.encoders.AudioEncoder
warnings.warn(
/home/vllm/miniconda3/envs/test_torch29/lib/python3.11/site-packages/torchaudio/_backend/ffmpeg.py:247: UserWarning: torio.io._streaming_media_encoder.StreamingMediaEncoder has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release.
s = torchaudio.io.StreamWriter(uri, format=muxer, buffer_size=buffer_size)
✓ torchaudio.save with BytesIO: SUCCESS
✓ soundfile workaround: SUCCESS (32044 bytes)
这是我的测试代码,在 2.9和 2.8下分别执行,看到结果
# test_audio_save.py
import io
import torch
import torchaudio
# Create sample audio tensor
sample_rate = 16000
duration = 1 # 1 second
tensor = torch.randn(1, sample_rate * duration) # [channel, time]
print("Testing torchaudio.save() with BytesIO...")
print(f"torch version: {torch.__version__}")
print(f"torchaudio version: {torchaudio.__version__}")
# Test 1: Try torchaudio.save with BytesIO (should fail in 2.9)
try:
with io.BytesIO() as out:
torchaudio.save(out, tensor, sample_rate, format="wav")
print("✓ torchaudio.save with BytesIO: SUCCESS")
except Exception as e:
print(f"✗ torchaudio.save with BytesIO: FAILED")
print(f" Error: {e}")
# Test 2: Try soundfile workaround
try:
import soundfile as sf
import numpy as np
audio_np = tensor.cpu().numpy()
if audio_np.ndim == 2:
audio_np = audio_np.T
with io.BytesIO() as out:
sf.write(out, audio_np, sample_rate, format="WAV")
result = out.getvalue()
print(f"✓ soundfile workaround: SUCCESS ({len(result)} bytes)")
except Exception as e:
print(f"✗ soundfile workaround: FAILED")
print(f" Error: {e}")
@qinxuye 发现要支持 torchaudio 2.9 比我想象的难:
- FishSpeech-1.5 会报 module 'torchaudio' has no attribute 'list_audio_backends'的,我读了 torchaudio 2.8 和 2.9的代码差异,发现
list_audio_backends就是在2.9去掉的 - CosyVoice2-0.5B 需要在安装
wetext包, ChatTTS 需要把transformers 从最新版降到 transformers==4.53.2 才能launch,然后这两个TTS在生成语音会报 Couldn't allocate AVFormatContext 的错, 修改可以按我之前说的
def audio_to_bytes(response_format: str, sample_rate: int, tensor: "torch.Tensor"):
import soundfile as sf
response_pcm = response_format.lower() == "pcm"
# Convert tensor to numpy and transpose to [time, channel] for soundfile
audio_np = tensor.cpu().numpy().T if tensor.ndim == 2 else tensor.cpu().numpy()
with io.BytesIO() as out:
if response_pcm:
logger.info(f"PCM output, num_channels: 1, sample_rate: {sample_rate}")
sf.write(out, audio_np, sample_rate, format="WAV", subtype="PCM_16")
return _extract_pcm_from_wav_bytes(out.getvalue())
else:
sf.write(out, audio_np, sample_rate, format=response_format.upper())
return out.getvalue()
- index-tts 会报 cannot import name 'SequenceSummary' from 'transformers.modeling_utils'
所以我没有提交代码改动的PR,只提交了 https://github.com/xorbitsai/inference/pull/4178/ 先所以 2.8的版本
This issue is stale because it has been open for 7 days with no activity.
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 5 days since being marked as stale.
@qiulang 想请教下,为啥不能修改音色,选择英文回复也只能出中文的语音