GPT-SoVITS
GPT-SoVITS copied to clipboard
能否調用cpu訓練
搞了半天,我打標都打好了,結果來那麼掃興的通知
能用cpu訓練嗎?
mark,顺便问下,有没有改好cpu得大佬,说下训练推理速度如何?
我用4070都要等待,cpu就算是能用,估计也是超级漫长的等待,建议还是更新一下硬件
不建议使用cpu,12400+32gb内存 双进程 batchsize为20(把内存吃满)时40s/it
不建议使用cpu,12400+32gb内存 双进程 batchsize为20(把内存吃满)时40s/it
如果用来做推理呢,应该不慢吧,大佬试过了没有
6s的音频合成了25s
我好奇怎麼用cpu訓練
1.把 GPT-SoVITS\GPT_SoVITS\prepare_datasets 下三个文件里的 os.environ["CUDA_VISIBLE_DEVICES"] = os.environ.get("_CUDA_VISIBLE_DEVICES") 这一句注释掉
2. 把"GPT-SoVITS\GPT_SoVITS\s2_train.py" 里的"""Assume Single Node Multi GPUs Training Only"""下面一行注释掉
3. 还是上面那个文件,把所有to("mps")改成to("cpu")
抱歉,上一条有缺漏。
s2_train里的os.environ["CUDA_VISIBLE_DEVICES"] = hps.train.gpu_numbers.replace("-", ",") 这一句也要注释掉
在s2_train的main里要手动设置n_gpu以指定开几个进程训练。
s1_train的main里trainer的初始化把accelerator改成cpu,把devices改成1,如果运行gpt训练时出现类型不匹配的问题再把precision改成32
CPU训练理论上是可行的,主要就是像 @ISDHN 说的把代码的相关部分更改成CPU。训练没有测试过,但是推理似乎是比GPU要慢许多
抱歉,上一条有缺漏。 s2_train里的os.environ["CUDA_VISIBLE_DEVICES"] = os.environ.get("_CUDA_VISIBLE_DEVICES") 这一句也要注释掉 在s2_train的main里要手动设置n_gpu以指定开几个进程训练。 s1_train的main里trainer的初始化把accelerator改成cpu,把devices改成1
你好,使用[预打包文件]修改了上述代码,在1B-微调训练 没有训练出来的模型文件
1.把 GPT-SoVITS\GPT_SoVITS\prepare_datasets 下三个文件里的 os.environ["CUDA_VISIBLE_DEVICES"] = os.environ.get("_CUDA_VISIBLE_DEVICES") 这一句注释掉 2. 把"GPT-SoVITS\GPT_SoVITS\s2_train.py" 里的"""Assume Single Node Multi GPUs Training Only"""下面一行注释掉 3. 还是上面那个文件,把所有to("mps")改成to("cpu")
痾
1.把 GPT-SoVITS\GPT_SoVITS\prepare_datasets 下三个文件里的 os.environ["CUDA_VISIBLE_DEVICES"] = os.environ.get("_CUDA_VISIBLE_DEVICES") 这一句注释掉 2. 把"GPT-SoVITS\GPT_SoVITS\s2_train.py" 里的"""Assume Single Node Multi GPUs Training Only"""下面一行注释掉 3. 还是上面那个文件,把所有to("mps")改成to("cpu")
痾
你好像找错文件夹了,我说的GPT-SoVITS\GPT_SoVITS\prepare_datasets中第一个GPT-SoVITS是有webui.py的那个文件夹
1.把 GPT-SoVITS\GPT_SoVITS\prepare_datasets 下三个文件里的 os.environ["CUDA_VISIBLE_DEVICES"] = os.environ.get("_CUDA_VISIBLE_DEVICES") 这一句注释掉 2. 把"GPT-SoVITS\GPT_SoVITS\s2_train.py" 里的"""Assume Single Node Multi GPUs Training Only"""下面一行注释掉 3. 还是上面那个文件,把所有to("mps")改成to("cpu")
痾
你好像找错文件夹了,我说的GPT-SoVITS\GPT_SoVITS\prepare_datasets中第一个GPT-SoVITS是有webui.py的那个文件夹 啊? 找到了
還是一樣
還是一樣
对的,还是这样显示,但是不用管,继续进行后续步骤
還是一樣
对的,还是这样显示,但是不用管,继续进行后续步骤
謝謝,已經開始在跑了
然後又出錯了
後台
webui截个图
webui截圖
大佬, 按上面的步骤 1B-微调训练没有训练出来的模型文件 咋搞T-T
大佬, 按上面的步骤 没有训练出来的模型文件 咋搞T-T
我不是大佬,你問錯人了QAQ
大佬, 按上面的步骤 1B-微调训练没有训练出来的模型文件 咋搞T-T
看看后台命令行
大佬, 按上面的步骤 1B-微调训练没有训练出来的模型文件 咋搞T-T
看看后台命令行
那我要幹啥
你填的list文件路径里好像有奇怪的字符(在D:\前面
图上我看不出来,但是后台消息里显示有多一个字符
大佬, 按上面的步骤 1B-微调训练没有训练出来的模型文件 咋搞T-T
看看后台命令行
SoVITS训练结束后台只有
"D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\python.exe" GPT_SoVITS/s2_train.py --config "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\TEMP/tmp_s2.json"
GPT训练结束后
"D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\python.exe" GPT_SoVITS/s1_train.py --config_file "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\TEMP/tmp_s1.yaml"
Seed set to 1234
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
<All keys matched successfully>
ckpt_path: None
[rank: 0] Seed set to 1234
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [17729382180.china.huawei.com]:59168 (system error: 10049 - 在其上下文中,该请求的地址无效。).
[W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [17729382180.china.huawei.com]:59168 (system error: 10049 - 在其上下文中,该请求的地址无效。).
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------
semantic_data_len: 0
phoneme_data_len: 3
Empty DataFrame
Columns: [item_name, semantic_audio]
Index: []
Traceback (most recent call last):
File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\s1_train.py", line 170, in <module>
main(args)
File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\s1_train.py", line 146, in main
trainer.fit(model, data_module, ckpt_path=ckpt_path)
File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 102, in launch
return function(*args, **kwargs)
File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 950, in _run
call._call_setup_hook(self) # allow user to setup lightning_module in accelerator environment
File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 92, in _call_setup_hook
_call_lightning_datamodule_hook(trainer, "setup", stage=fn)
File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 179, in _call_lightning_datamodule_hook
return fn(*args, **kwargs)
File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\AR\data\data_module.py", line 29, in setup
self._train_dataset = Text2SemanticDataset(
File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\AR\data\dataset.py", line 107, in __init__
self.init_batch()
File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\AR\data\dataset.py", line 187, in init_batch
for _ in range(max(2, int(min_num / leng))):
ZeroDivisionError: division by zero
图上我看不出来,但是后台消息里显示有多一个字符
我看看
图上我看不出来,但是后台消息里显示有多一个字符
建议自行搜索\u202a,这个不是本代码库的问题或cpu训练的问题
大佬, 按上面的步骤 1B-微调训练没有训练出来的模型文件 咋搞T-T
看看后台命令行
SoVITS训练结束后台只有
"D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\python.exe" GPT_SoVITS/s2_train.py --config "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\TEMP/tmp_s2.json"
GPT训练结束后"D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\python.exe" GPT_SoVITS/s1_train.py --config_file "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\TEMP/tmp_s1.yaml" Seed set to 1234 GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs <All keys matched successfully> ckpt_path: None [rank: 0] Seed set to 1234 Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1 [W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [17729382180.china.huawei.com]:59168 (system error: 10049 - 在其上下文中,该请求的地址无效。). [W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [17729382180.china.huawei.com]:59168 (system error: 10049 - 在其上下文中,该请求的地址无效。). ---------------------------------------------------------------------------------------------------- distributed_backend=gloo All distributed processes registered. Starting with 1 processes ---------------------------------------------------------------------------------------------------- semantic_data_len: 0 phoneme_data_len: 3 Empty DataFrame Columns: [item_name, semantic_audio] Index: [] Traceback (most recent call last): File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\s1_train.py", line 170, in <module> main(args) File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\s1_train.py", line 146, in main trainer.fit(model, data_module, ckpt_path=ckpt_path) File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 544, in fit call._call_and_handle_interrupt( File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 102, in launch return function(*args, **kwargs) File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 580, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 950, in _run call._call_setup_hook(self) # allow user to setup lightning_module in accelerator environment File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 92, in _call_setup_hook _call_lightning_datamodule_hook(trainer, "setup", stage=fn) File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 179, in _call_lightning_datamodule_hook return fn(*args, **kwargs) File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\AR\data\data_module.py", line 29, in setup self._train_dataset = Text2SemanticDataset( File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\AR\data\dataset.py", line 107, in __init__ self.init_batch() File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\AR\data\dataset.py", line 187, in init_batch for _ in range(max(2, int(min_num / leng))): ZeroDivisionError: division by zero
你s2_train.py怎么改的
- 注释了这一行
"""Assume Single Node Multi GPUs Training Only"""
# assert torch.cuda.is_available() or torch.backends.mps.is_available(), "Only GPU training is allowed."
-
to("mps")改成to("cpu")
-
这句话无完全匹配
os.environ["CUDA_VISIBLE_DEVICES"] = os.environ.get("_CUDA_VISIBLE_DEVICES")
把这句话注释了#os.environ["CUDA_VISIBLE_DEVICES"] = hps.train.gpu_numbers.replace("-", ",")
-
在s2_train的main里要手动设置n_gpu以指定开几个进程训练---不知道咋改,原本就是n_gpus = 1
你s2_train.py怎么改的
- 注释了这一行
"""Assume Single Node Multi GPUs Training Only""" # assert torch.cuda.is_available() or torch.backends.mps.is_available(), "Only GPU training is allowed."
- to("mps")改成to("cpu")
- 这句话无完全匹配
os.environ["CUDA_VISIBLE_DEVICES"] = os.environ.get("_CUDA_VISIBLE_DEVICES")
把这句话注释了#os.environ["CUDA_VISIBLE_DEVICES"] = hps.train.gpu_numbers.replace("-", ",")
- 在s2_train的main里要手动设置n_gpu以指定开几个进程训练---不知道咋改,原本就是n_gpus = 1
def main():
"""Assume Single Node Multi GPUs Training Only"""
# assert torch.cuda.is_available() or torch.backends.mps.is_available(), "Only GPU training is allowed."
# if torch.backends.mps.is_available():
# n_gpus = 1
# else:
# n_gpus = torch.cuda.device_count()
n_gpus = 1
os.environ["MASTER_ADDR"] = "localhost"
os.environ["MASTER_PORT"] = str(randint(20000, 55555))
mp.spawn(
run,
nprocs=n_gpus,
args=(
n_gpus,
hps,
),
)
谢谢提醒,我上面写错了💦💦💦