GPT-SoVITS
GPT-SoVITS copied to clipboard
MAC合成出来的语音没声音,只听到两声吸气
开启TTS推理WebUI时报错,说找不到 FFmpeg extension
"/Users/taoxu/miniconda3/envs/GPTSoVits/bin/python" GPT_SoVITS/inference_webui.py
DEBUG:torio._extension.utils:Loading FFmpeg6
DEBUG:torio._extension.utils:Failed to load FFmpeg6 extension.
Traceback (most recent call last):
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 108, in _find_versionsed_ffmpeg_extension
_load_lib(lib)
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 94, in _load_lib
torch.ops.load_library(path)
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/_ops.py", line 1003, in load_library
ctypes.CDLL(path)
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/ctypes/__init__.py", line 382, in __init__
self._handle = _dlopen(self._name, mode)
OSError: dlopen(/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg6.so, 0x0006): Library not loaded: @rpath/libavutil.58.dylib
Referenced from: <47D7ABF2-086E-3080-BD43-088B7CE5B6B3> /Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg6.so
Reason: tried: '/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/lib-dynload/../../libavutil.58.dylib' (no such file), '/Users/taoxu/miniconda3/envs/GPTSoVits/bin/../lib/libavutil.58.dylib' (no such file), '/usr/local/lib/libavutil.58.dylib' (no such file), '/usr/lib/libavutil.58.dylib' (no such file, not in dyld cache)
DEBUG:torio._extension.utils:Loading FFmpeg5
DEBUG:torio._extension.utils:Failed to load FFmpeg5 extension.
Traceback (most recent call last):
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 108, in _find_versionsed_ffmpeg_extension
_load_lib(lib)
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 94, in _load_lib
torch.ops.load_library(path)
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/_ops.py", line 1003, in load_library
ctypes.CDLL(path)
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/ctypes/__init__.py", line 382, in __init__
self._handle = _dlopen(self._name, mode)
OSError: dlopen(/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg5.so, 0x0006): Library not loaded: @rpath/libavutil.57.dylib
Referenced from: <3ED882E0-A742-36B7-B54D-9D6FC74461A3> /Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg5.so
Reason: tried: '/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/lib-dynload/../../libavutil.57.dylib' (no such file), '/Users/taoxu/miniconda3/envs/GPTSoVits/bin/../lib/libavutil.57.dylib' (no such file), '/usr/local/lib/libavutil.57.dylib' (no such file), '/usr/lib/libavutil.57.dylib' (no such file, not in dyld cache)
DEBUG:torio._extension.utils:Loading FFmpeg4
DEBUG:torio._extension.utils:Failed to load FFmpeg4 extension.
Traceback (most recent call last):
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 108, in _find_versionsed_ffmpeg_extension
_load_lib(lib)
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 94, in _load_lib
torch.ops.load_library(path)
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/_ops.py", line 1003, in load_library
ctypes.CDLL(path)
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/ctypes/__init__.py", line 382, in __init__
self._handle = _dlopen(self._name, mode)
OSError: dlopen(/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg4.so, 0x0006): Library not loaded: @rpath/libavutil.56.dylib
Referenced from: <0F44C7E0-FB42-3737-9603-D52E5202730D> /Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg4.so
Reason: tried: '/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/lib-dynload/../../libavutil.56.dylib' (no such file), '/Users/taoxu/miniconda3/envs/GPTSoVits/bin/../lib/libavutil.56.dylib' (no such file), '/usr/local/lib/libavutil.56.dylib' (no such file), '/usr/lib/libavutil.56.dylib' (no such file, not in dyld cache)
DEBUG:torio._extension.utils:Loading FFmpeg
DEBUG:torio._extension.utils:Failed to load FFmpeg extension.
Traceback (most recent call last):
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 106, in _find_versionsed_ffmpeg_extension
raise RuntimeError(f"FFmpeg{version} extension is not available.")
RuntimeError: FFmpeg extension is not available.
Some weights of the model checkpoint at GPT_SoVITS/pretrained_models/chinese-hubert-base were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModel were not initialized from the model checkpoint at GPT_SoVITS/pretrained_models/chinese-hubert-base and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
<All keys matched successfully>
Number of parameter: 77.49M
Running on local URL: http://0.0.0.0:9872
接着合成的音频没声音
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
14%|████████████▋ | 204/1500 [00:53<08:42, 2.48it/s]T2S Decoding EOS [128 -> 332]
14%|████████████▋ | 204/1500 [00:53<05:39, 3.81it/s]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/functional.py:660: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/SpectralOps.cpp:879.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/functional.py:4522: UserWarning: MPS: The constant padding of more than 3 dimensions is not currently supported natively. It uses View Ops default implementation to run. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Pad.mm:474.)
return torch._C._nn.pad(input, pad, mode, value)
1.151 0.569 53.598 7.114
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
DEBUG:pydub.converter:subprocess.call(['ffmpeg', '-y', '-i', '/var/folders/f3/n23rr3s558x0_hkct1m9w9p80000gn/T/gradio/b75baba4e0d5853d71c6d0dfae28ea864ba89580/参考音频.wav', '-acodec', 'pcm_s16le', '-vn', '-f', 'wav', '-'])
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
修改TOKENIZERS_PARALLELISM为false,再合成还是没声音
export TOKENIZERS_PARALLELISM=false
python webui.py
13%|███████████▋ | 188/1500 [00:45<08:51, 2.47it/s]T2S Decoding EOS [128 -> 316]
13%|███████████▋ | 188/1500 [00:46<05:21, 4.08it/s]
/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/functional.py:660: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/SpectralOps.cpp:879.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/functional.py:4522: UserWarning: MPS: The constant padding of more than 3 dimensions is not currently supported natively. It uses View Ops default implementation to run. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Pad.mm:474.)
return torch._C._nn.pad(input, pad, mode, value)
1.581 0.717 46.118 6.470
DEBUG:pydub.converter:subprocess.call(['ffmpeg', '-y', '-i', '/var/folders/f3/n23rr3s558x0_hkct1m9w9p80000gn/T/gradio/b75baba4e0d5853d71c6d0dfae28ea864ba89580/参考音频.wav', '-acodec', 'pcm_s16le', '-vn', '-f', 'wav', '-'])
修改TOKENIZERS_PARALLELISM为true,再合成依旧没声音
export TOKENIZERS_PARALLELISM=true
python webui.py
13%|███████████▊ | 191/1500 [00:46<09:08, 2.38it/s]T2S Decoding EOS [128 -> 319]
13%|███████████▊ | 191/1500 [00:46<05:21, 4.07it/s]
/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/functional.py:660: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/SpectralOps.cpp:879.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/functional.py:4522: UserWarning: MPS: The constant padding of more than 3 dimensions is not currently supported natively. It uses View Ops default implementation to run. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Pad.mm:474.)
return torch._C._nn.pad(input, pad, mode, value)
1.423 0.718 46.951 6.457
DEBUG:pydub.converter:subprocess.call(['ffmpeg', '-y', '-i', '/var/folders/f3/n23rr3s558x0_hkct1m9w9p80000gn/T/gradio/b75baba4e0d5853d71c6d0dfae28ea864ba89580/参考音频.wav', '-acodec', 'pcm_s16le', '-vn', '-f', 'wav', '-'])
把根目录的config.py的is_half
is_half = eval(os.environ.get("is_half","True"))
改成
is_half = eval(os.environ.get("is_half","False"))
再运行 python webui.py
合成还是没声音
ffmpeg的这个报错应该不影响,报错的torio是哪里的我也没搞清楚,从来没见过,通过brew安装了ffmpeg应该就可以了。 合成音频的问题试试其他模型?生成不同的内容再试试?
你两个模型选的对吗?没有选成另一个训练的模型吧?我一般气声是选错模型
哪位mac的大佬指导一下啊 十分感谢
ffmpeg的这个报错应该不影响,报错的torio是哪里的我也没搞清楚,从来没见过,通过brew安装了ffmpeg应该就可以了。 合成音频的问题试试其他模型?生成不同的内容再试试?
brew和conda都装了ffmpeg 模型随便怎么切 生成的8秒音频只有吸气的声音
你两个模型选的对吗?没有选成另一个训练的模型吧?我一般气声是选错模型
选的就是我训练出来的模型 没有选默认的
那你试试不同epoch的模型,稍微低一点的
有mac训练出较好的模型了吗?大佬给分享下经验吧,目前只最小epoch的有人声,但不声音不好。
有mac训练出较好的模型了吗?大佬给分享下经验吧,目前只最小epoch的有人声,但不声音不好。
刚才在colab里使用原来的素材,用默认参数,训练了一下,可以正常推理,下载到本地也正常推理。除了语速稍快,其他还好。看来是本地训练某个环节有问题,能不能是我们下载的那个底模,不支持苹果芯片的原因?
有mac训练出较好的模型了吗?大佬给分享下经验吧,目前只最小epoch的有人声,但不声音不好。
刚才在colab里使用原来的素材,用默认参数,训练了一下,可以正常推理,下载到本地也正常推理。除了语速稍快,其他还好。看来是本地训练某个环节有问题,能不能是我们下载的那个底模,不支持苹果芯片的原因?
我的Mac是苹果芯片,你的是Intel芯片吗
有mac训练出较好的模型了吗?大佬给分享下经验吧,目前只最小epoch的有人声,但不声音不好。
刚才在colab里使用原来的素材,用默认参数,训练了一下,可以正常推理,下载到本地也正常推理。除了语速稍快,其他还好。看来是本地训练某个环节有问题,能不能是我们下载的那个底模,不支持苹果芯片的原因?
我的Mac是苹果芯片,你的是Intel芯片吗
m2芯片,本地训练多次尝试都是失败,只有呼噜声。有人说可能是我们下载的底模不支持苹果芯片,所以才会这样。colab训练后的模型下载下来,本地可以推理。