GPT-SoVITS
GPT-SoVITS copied to clipboard
整合包訓練GPT時出現錯誤:RuntimeError: unmatched '}' in format string
D:\GPT-SoVITS>runtime\python.exe webui.py
Running on local URL: http://0.0.0.0:9874
"D:\GPT-SoVITS\runtime\python.exe" GPT_SoVITS/s1_train.py --config_file "TEMP/tmp_s1.yaml"
Seed set to 1234
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
<All keys matched successfully>
ckpt_path: None
[rank: 0] Seed set to 1234
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
Traceback (most recent call last):
File "D:\GPT-SoVITS\GPT_SoVITS\s1_train.py", line 171, in <module>
main(args)
File "D:\GPT-SoVITS\GPT_SoVITS\s1_train.py", line 147, in main
trainer.fit(model, data_module, ckpt_path=ckpt_path)
File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 102, in launch
return function(*args, **kwargs)
File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 947, in _run
self.strategy.setup_environment()
File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 148, in setup_environment
self.setup_distributed()
File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 199, in setup_distributed
_init_dist_connection(self.cluster_environment, self._process_group_backend, timeout=self._timeout)
File "D:\GPT-SoVITS\runtime\lib\site-packages\lightning_fabric\utilities\distributed.py", line 290, in _init_dist_connection
torch.distributed.init_process_group(torch_distributed_backend, rank=global_rank, world_size=world_size, **kwargs)
File "D:\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\distributed_c10d.py", line 888, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "D:\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\rendezvous.py", line 245, in _env_rendezvous_handler
store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout)
File "D:\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\rendezvous.py", line 176, in _create_c10d_store
return TCPStore(
RuntimeError: unmatched '}' in format string
我也一样,过了很久说超时,batch_size为3,其他默认,询问gpt,检查了端口并无占用,请问怎么解决
文件结构如下
temp_s1.yaml文件如下
same problem. OS: Win11, CUDA 12.1
I am facing the same issue as well. 一樣的問題 os : win11 torch 2.1.2+cu118 torchaudio 2.0.1+cu118 torchmetrics 1.3.0.post0 torchvision 0.15.1+cu118
If anyone has insights or solutions, I would greatly appreciate the help. Thank you! 如果有人有見解或解決方案,我將非常感激。 謝謝!
同樣問題 OS: Win11, CUDA 11.8
以下組合都試過問題還是無法解決 Python 3.9, Python 3.10 PyTorch 2.0.1, PyTorch 2.1.2
我找到暫時的解法了 打開GPT-SoVITS\runtime\lib\site-packages\torch\distributed\rendezvous.py 找到這行start_daemon = rank == 0大約在175行 下方增加一行hostname = "localhost"就可以了
我找到暫時的解法了 打開GPT-SoVITS\runtime\lib\site-packages\torch\distributed\rendezvous.py 找到這行start_daemon = rank == 0大約在175行 下方增加一行hostname = "localhost"就可以了
thanks ,work good.
非常感謝!! 可以用!
我找到暫時的解法了 打開GPT-SoVITS\runtime\lib\site-packages\torch\distributed\rendezvous.py 找到這行start_daemon = rank == 0大約在175行 下方增加一行hostname = "localhost"就可以了
File "H:\SDAI\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\rendezvous.py", line 177 return TCPStore( hostname, port, world_size, start_daemon, timeout, multi_tenant=True) IndentationError: unexpected indent 已加入,出現另一個報錯
是不是沒有縮排?增加的那行開頭要對齊上一行start_daemon = rank == 0
(遇到这个问题的大家@win10ogod @light1943 )你们ping 127.0.0.1和ping localhost是同样的结果吗? 看来是地址只能写localhost而不能写127.0.0.1导致的?
https://github.com/RVC-Boss/GPT-SoVITS/commit/59f35adad85815df27e9c6b33d420f5ebfd8376b 理论上该commit修复了楼主的问题。如果还不行试试楼上添加hostname = "localhost"的方法。
"C:\GPT-SoVITS-beta\runtime\python.exe" GPT_SoVITS/s1_train.py --config_file "C:\GPT-SoVITS-beta\TEMP/tmp_s1.yaml"
Seed set to 1234
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
<All keys matched successfully>
ckpt_path: None
[rank: 0] Seed set to 1234
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
Traceback (most recent call last):
File "C:\GPT-SoVITS-beta\GPT_SoVITS\s1_train.py", line 170, in
請問這個該怎麼解決,我按照樓主的方式處理過了,但依舊報錯,還請幫幫忙
其實我用這方法依然無解 但是目前最新的版本就修正了 去更新版本吧
(遇到这个问题的大家@win10ogod @light1943 )你们ping 127.0.0.1和ping localhost是同样的结果吗? 看来是地址只能写localhost而不能写127.0.0.1导致的?
是的localhost就是127.0.0.1,但在這裡地址只能寫localhost,寫IP 127.0.0.1會出錯。 這是torch的一個奇怪bug,且似乎只在Windows環境下出現。 改成localhost能解是最近在其他地方有人找到的解法。
新的版本59f35ad已經修復這個問題,不需要修改torch的rendezvous.py了。 感謝幫忙!
吧
我一直在更行版本,但这问题依旧存在,依旧跟我上面遇到的报错一模一样,我真的不知道哪里出错
我一直在更行版本,但这问题依旧存在,依旧跟我上面遇到的报错一模一样,我真的不知道哪里出错
如果你照我的方式處理過了,報错應該會變成line 177而不是line 176,可以檢查是不是改错檔案了。 如果改對了,也更新到59f35ad之後的版本還是報錯,那可能就要在自己找解法了。 畢竟torch這個奇怪的問題也沒有人真正的去解析它,只知道在某些環境下hostname不吃ip。
File "C:\GPT-SoVITS-beta\runtime\lib\site-packages\torch\distributed\rendezvous.py", line 176, in _create_c10d_store return TCPStore( RuntimeError: unmatched '}' in format string