GPT-SoVITS 能否調用cpu訓練

搞了半天，我打標都打好了，結果來那麼掃興的通知能用cpu訓練嗎?

Jan 29 '24 13:01 zackzheng1121

mark，顺便问下，有没有改好cpu得大佬，说下训练推理速度如何？

Jan 30 '24 03:01 brooks348

我用4070都要等待，cpu就算是能用，估计也是超级漫长的等待，建议还是更新一下硬件

Jan 30 '24 03:01 angenet

不建议使用cpu，12400+32gb内存双进程 batchsize为20（把内存吃满）时40s/it

Jan 30 '24 05:01 ISDHN

不建议使用cpu，12400+32gb内存双进程 batchsize为20（把内存吃满）时40s/it

如果用来做推理呢，应该不慢吧，大佬试过了没有

Jan 30 '24 06:01 brooks348

6s的音频合成了25s

Jan 30 '24 06:01 ISDHN

我好奇怎麼用cpu訓練

Jan 30 '24 07:01 zackzheng1121

1.把 GPT-SoVITS\GPT_SoVITS\prepare_datasets 下三个文件里的 os.environ["CUDA_VISIBLE_DEVICES"] = os.environ.get("_CUDA_VISIBLE_DEVICES") 这一句注释掉
2. 把"GPT-SoVITS\GPT_SoVITS\s2_train.py" 里的"""Assume Single Node Multi GPUs Training Only"""下面一行注释掉
3. 还是上面那个文件，把所有to("mps")改成to("cpu")

Jan 30 '24 08:01 ISDHN

抱歉，上一条有缺漏。
s2_train里的os.environ["CUDA_VISIBLE_DEVICES"] = hps.train.gpu_numbers.replace("-", ",") 这一句也要注释掉
在s2_train的main里要手动设置n_gpu以指定开几个进程训练。 s1_train的main里trainer的初始化把accelerator改成cpu，把devices改成1，如果运行gpt训练时出现类型不匹配的问题再把precision改成32

Jan 30 '24 10:01 ISDHN

CPU训练理论上是可行的，主要就是像 @ISDHN 说的把代码的相关部分更改成CPU。训练没有测试过，但是推理似乎是比GPU要慢许多

Jan 30 '24 13:01 Lion-Wu

抱歉，上一条有缺漏。 s2_train里的os.environ["CUDA_VISIBLE_DEVICES"] = os.environ.get("_CUDA_VISIBLE_DEVICES") 这一句也要注释掉在s2_train的main里要手动设置n_gpu以指定开几个进程训练。 s1_train的main里trainer的初始化把accelerator改成cpu，把devices改成1

你好，使用[预打包文件]修改了上述代码，在1B-微调训练没有训练出来的模型文件

Jan 31 '24 03:01 erhuzi001

1.把 GPT-SoVITS\GPT_SoVITS\prepare_datasets 下三个文件里的 os.environ["CUDA_VISIBLE_DEVICES"] = os.environ.get("_CUDA_VISIBLE_DEVICES") 这一句注释掉 2. 把"GPT-SoVITS\GPT_SoVITS\s2_train.py" 里的"""Assume Single Node Multi GPUs Training Only"""下面一行注释掉 3. 还是上面那个文件，把所有to("mps")改成to("cpu")

痾

Jan 31 '24 03:01 zackzheng1121

1.把 GPT-SoVITS\GPT_SoVITS\prepare_datasets 下三个文件里的 os.environ["CUDA_VISIBLE_DEVICES"] = os.environ.get("_CUDA_VISIBLE_DEVICES") 这一句注释掉 2. 把"GPT-SoVITS\GPT_SoVITS\s2_train.py" 里的"""Assume Single Node Multi GPUs Training Only"""下面一行注释掉 3. 还是上面那个文件，把所有to("mps")改成to("cpu")

痾

你好像找错文件夹了，我说的GPT-SoVITS\GPT_SoVITS\prepare_datasets中第一个GPT-SoVITS是有webui.py的那个文件夹

Jan 31 '24 03:01 ISDHN

1.把 GPT-SoVITS\GPT_SoVITS\prepare_datasets 下三个文件里的 os.environ["CUDA_VISIBLE_DEVICES"] = os.environ.get("_CUDA_VISIBLE_DEVICES") 这一句注释掉 2. 把"GPT-SoVITS\GPT_SoVITS\s2_train.py" 里的"""Assume Single Node Multi GPUs Training Only"""下面一行注释掉 3. 还是上面那个文件，把所有to("mps")改成to("cpu")

痾

你好像找错文件夹了，我说的GPT-SoVITS\GPT_SoVITS\prepare_datasets中第一个GPT-SoVITS是有webui.py的那个文件夹啊? 找到了

Jan 31 '24 03:01 zackzheng1121

還是一樣

Jan 31 '24 03:01 zackzheng1121

還是一樣

对的，还是这样显示，但是不用管，继续进行后续步骤

Jan 31 '24 03:01 ISDHN

還是一樣

对的，还是这样显示，但是不用管，继续进行后续步骤

謝謝，已經開始在跑了然後又出錯了

後台

Jan 31 '24 03:01 zackzheng1121

webui截个图

Jan 31 '24 03:01 ISDHN

webui截圖

Jan 31 '24 03:01 zackzheng1121

大佬，按上面的步骤 1B-微调训练没有训练出来的模型文件咋搞T-T

Jan 31 '24 03:01 erhuzi001

大佬，按上面的步骤没有训练出来的模型文件咋搞T-T

我不是大佬，你問錯人了QAQ

Jan 31 '24 03:01 zackzheng1121

大佬，按上面的步骤 1B-微调训练没有训练出来的模型文件咋搞T-T

看看后台命令行

Jan 31 '24 03:01 ISDHN

大佬，按上面的步骤 1B-微调训练没有训练出来的模型文件咋搞T-T

看看后台命令行

那我要幹啥

Jan 31 '24 03:01 zackzheng1121

你填的list文件路径里好像有奇怪的字符（在D:\前面

Jan 31 '24 03:01 ISDHN

Jan 31 '24 03:01 zackzheng1121

图上我看不出来，但是后台消息里显示有多一个字符

Jan 31 '24 03:01 ISDHN

大佬，按上面的步骤 1B-微调训练没有训练出来的模型文件咋搞T-T

看看后台命令行

SoVITS训练结束后台只有

"D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\python.exe" GPT_SoVITS/s2_train.py --config "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\TEMP/tmp_s2.json"

GPT训练结束后

"D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\python.exe" GPT_SoVITS/s1_train.py --config_file "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\TEMP/tmp_s1.yaml"
Seed set to 1234
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
<All keys matched successfully>
ckpt_path: None
[rank: 0] Seed set to 1234
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [17729382180.china.huawei.com]:59168 (system error: 10049 - 在其上下文中，该请求的地址无效。).
[W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [17729382180.china.huawei.com]:59168 (system error: 10049 - 在其上下文中，该请求的地址无效。).
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

semantic_data_len: 0
phoneme_data_len: 3
Empty DataFrame
Columns: [item_name, semantic_audio]
Index: []
Traceback (most recent call last):
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\s1_train.py", line 170, in <module>
    main(args)
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\s1_train.py", line 146, in main
    trainer.fit(model, data_module, ckpt_path=ckpt_path)
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 102, in launch
    return function(*args, **kwargs)
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 950, in _run
    call._call_setup_hook(self)  # allow user to setup lightning_module in accelerator environment
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 92, in _call_setup_hook
    _call_lightning_datamodule_hook(trainer, "setup", stage=fn)
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 179, in _call_lightning_datamodule_hook
    return fn(*args, **kwargs)
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\AR\data\data_module.py", line 29, in setup
    self._train_dataset = Text2SemanticDataset(
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\AR\data\dataset.py", line 107, in __init__
    self.init_batch()
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\AR\data\dataset.py", line 187, in init_batch
    for _ in range(max(2, int(min_num / leng))):
ZeroDivisionError: division by zero

Jan 31 '24 03:01 erhuzi001

图上我看不出来，但是后台消息里显示有多一个字符

我看看

Jan 31 '24 03:01 zackzheng1121

图上我看不出来，但是后台消息里显示有多一个字符

建议自行搜索\u202a,这个不是本代码库的问题或cpu训练的问题

Jan 31 '24 03:01 ISDHN

大佬，按上面的步骤 1B-微调训练没有训练出来的模型文件咋搞T-T

看看后台命令行

SoVITS训练结束后台只有 "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\python.exe" GPT_SoVITS/s2_train.py --config "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\TEMP/tmp_s2.json" GPT训练结束后

"D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\python.exe" GPT_SoVITS/s1_train.py --config_file "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\TEMP/tmp_s1.yaml"
Seed set to 1234
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
<All keys matched successfully>
ckpt_path: None
[rank: 0] Seed set to 1234
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [17729382180.china.huawei.com]:59168 (system error: 10049 - 在其上下文中，该请求的地址无效。).
[W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [17729382180.china.huawei.com]:59168 (system error: 10049 - 在其上下文中，该请求的地址无效。).
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

semantic_data_len: 0
phoneme_data_len: 3
Empty DataFrame
Columns: [item_name, semantic_audio]
Index: []
Traceback (most recent call last):
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\s1_train.py", line 170, in <module>
    main(args)
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\s1_train.py", line 146, in main
    trainer.fit(model, data_module, ckpt_path=ckpt_path)
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 102, in launch
    return function(*args, **kwargs)
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 950, in _run
    call._call_setup_hook(self)  # allow user to setup lightning_module in accelerator environment
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 92, in _call_setup_hook
    _call_lightning_datamodule_hook(trainer, "setup", stage=fn)
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 179, in _call_lightning_datamodule_hook
    return fn(*args, **kwargs)
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\AR\data\data_module.py", line 29, in setup
    self._train_dataset = Text2SemanticDataset(
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\AR\data\dataset.py", line 107, in __init__
    self.init_batch()
  File "D:\users\xxxx\Downloads\GPT-SoVITS-beta\GPT-SoVITS-beta0128\GPT_SoVITS\AR\data\dataset.py", line 187, in init_batch
    for _ in range(max(2, int(min_num / leng))):
ZeroDivisionError: division by zero

你s2_train.py怎么改的

注释了这一行

 """Assume Single Node Multi GPUs Training Only"""
    # assert torch.cuda.is_available() or torch.backends.mps.is_available(), "Only GPU training is allowed."

to("mps")改成to("cpu")
这句话无完全匹配 os.environ["CUDA_VISIBLE_DEVICES"] = os.environ.get("_CUDA_VISIBLE_DEVICES") 把这句话注释了 #os.environ["CUDA_VISIBLE_DEVICES"] = hps.train.gpu_numbers.replace("-", ",")
在s2_train的main里要手动设置n_gpu以指定开几个进程训练---不知道咋改，原本就是n_gpus = 1

Jan 31 '24 03:01 erhuzi001

你s2_train.py怎么改的

注释了这一行
 """Assume Single Node Multi GPUs Training Only"""
    # assert torch.cuda.is_available() or torch.backends.mps.is_available(), "Only GPU training is allowed."
to("mps")改成to("cpu")

这句话无完全匹配 os.environ["CUDA_VISIBLE_DEVICES"] = os.environ.get("_CUDA_VISIBLE_DEVICES") 把这句话注释了 #os.environ["CUDA_VISIBLE_DEVICES"] = hps.train.gpu_numbers.replace("-", ",")

在s2_train的main里要手动设置n_gpu以指定开几个进程训练---不知道咋改，原本就是n_gpus = 1

def main():
    """Assume Single Node Multi GPUs Training Only"""
    # assert torch.cuda.is_available() or torch.backends.mps.is_available(), "Only GPU training is allowed."
    # if torch.backends.mps.is_available():
    #     n_gpus = 1
    # else:
    #     n_gpus = torch.cuda.device_count()
    n_gpus = 1
    os.environ["MASTER_ADDR"] = "localhost"
    os.environ["MASTER_PORT"] = str(randint(20000, 55555))
    mp.spawn(
        run,
        nprocs=n_gpus,
        args=(
            n_gpus,
            hps,
        ),
    )

谢谢提醒，我上面写错了💦💦💦

Jan 31 '24 03:01 ISDHN

GPT-SoVITS GPT-SoVITS copied to clipboard

能否調用cpu訓練

GPT-SoVITS
GPT-SoVITS copied to clipboard