Retrieval-based-Voice-Conversion-WebUI
Retrieval-based-Voice-Conversion-WebUI copied to clipboard
Dual GPU - RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:534] Connection closed by peer
Issue: Training Process Halts with Errors
Environment Specifications:
- Operating System: Debian GNU/Linux 12 (bookworm) x86_64 6.1.0-9-amd64
- Python Version: 3.10
- GPU: Dual NVIDIA GeForce RTX 4090s
- Python Environment: Conda virtual environment specifically for RVC-WebUI
Steps to Reproduce:
- Created a conda environment with Python version 3.10.
- Cloned RVC's git repository.
- Run run.sh and install dependencies, and then run RVC-WebUI.
- Navigated to the training tab, performed preprocessing, and then performed feature extraction.
- Started the training process.
Issue Encountered: During training, I encountered a series of errors, causing the training process to halt. The detailed error log is as follows:
2023-11-18 02:32:55 | INFO | __main__ | Use gpus: 0-1
2023-11-18 02:32:55 | INFO | __main__ | "python3" infer/modules/train/train.py -e "model-test" -sr 48k -f0 1 -bs 12 -g 0-1 -te 1000 -se 1 -pg /home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2/f0G48k.pth -pd /home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2/f0D48k.pth -l 0 -c 1 -sw 0 -v v2
INFO:model-test:{'data': {'filter_length': 2048, 'hop_length': 480, 'max_wav_value': 32768.0, 'mel_fmax': None, 'mel_fmin': 0.0, 'n_mel_channels': 128, 'sampling_rate': 48000, 'win_length': 2048, 'training_files': './logs/model-test/filelist.txt'}, 'model': {'filter_channels': 768, 'gin_channels': 256, 'hidden_channels': 192, 'inter_channels': 192, 'kernel_size': 3, 'n_heads': 2, 'n_layers': 6, 'p_dropout': 0, 'resblock': '1', 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'resblock_kernel_sizes': [3, 7, 11], 'spk_embed_dim': 109, 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [24, 20, 4, 4], 'upsample_rates': [12, 10, 2, 2], 'use_spectral_norm': False}, 'train': {'batch_size': 12, 'betas': [0.8, 0.99], 'c_kl': 1.0, 'c_mel': 45, 'epochs': 20000, 'eps': 1e-09, 'fp16_run': True, 'init_lr_ratio': 1, 'learning_rate': 0.0001, 'log_interval': 200, 'lr_decay': 0.999875, 'seed': 1234, 'segment_size': 17280, 'warmup_epochs': 0}, 'model_dir': './logs/model-test', 'experiment_dir': './logs/model-test', 'save_every_epoch': 1, 'name': 'model-test', 'total_epoch': 1000, 'pretrainG': '/home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2/f0G48k.pth', 'pretrainD': '/home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2/f0D48k.pth', 'version': 'v2', 'gpus': '0-1', 'sample_rate': '48k', 'if_f0': 1, 'if_latest': 0, 'save_every_weights': '0', 'if_cache_data_in_gpu': 1}
/home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/.venv/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
/home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/.venv/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
DEBUG:infer.lib.infer_pack.models:gin_channels: 256, self.spk_embed_dim: 109
DEBUG:infer.lib.infer_pack.models:gin_channels: 256, self.spk_embed_dim: 109
INFO:model-test:loaded pretrained /home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2/f0G48k.pth
Process Process-2:
Traceback (most recent call last):
File "/home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/infer/modules/train/train.py", line 213, in run
utils.latest_checkpoint_path(hps.model_dir, "D_*.pth"), net_d, optim_d
File "/home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/infer/lib/train/utils.py", line 213, in latest_checkpoint_path
x = f_list[-1]
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/hkyouma/miniconda3/envs/voicegen-rvc/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/hkyouma/miniconda3/envs/voicegen-rvc/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/infer/modules/train/train.py", line 232, in run
logger.info(
UnboundLocalError: local variable 'logger' referenced before assignment
INFO:model-test:<All keys matched successfully>
INFO:model-test:loaded pretrained /home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2/f0D48k.pth
INFO:model-test:<All keys matched successfully>
/home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/.venv/lib/python3.10/site-packages/torch/functional.py:650: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:863.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/.venv/lib/python3.10/site-packages/torch/functional.py:650: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:863.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/.venv/lib/python3.10/site-packages/torch/functional.py:650: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:863.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/.venv/lib/python3.10/site-packages/torch/functional.py:650: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:863.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/.venv/lib/python3.10/site-packages/torch/functional.py:650: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:863.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
Process Process-1:
Traceback (most recent call last):
File "/home/hkyouma/miniconda3/envs/voicegen-rvc/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/hkyouma/miniconda3/envs/voicegen-rvc/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/infer/modules/train/train.py", line 271, in run
train_and_evaluate(
File "/home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/infer/modules/train/train.py", line 484, in train_and_evaluate
scaler.scale(loss_disc).backward()
File "/home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/.venv/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
torch.autograd.backward(
File "/home/hkyouma/ai/voicegen/Retrieval-based-Voice-Conversion-WebUI/.venv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:534] Connection closed by peer [127.0.0.1]:27868
Any assistance or guidance on resolving these errors would be greatly appreciated.
The error seems to stem from using two GPUs for training. Using just one makes the issue go away.
I changed to gpu:0 but still get the same error:
(cvr) kk1@kk1:/media/disk1_ssd/Rvc$ python infer-web.py
2023-11-19 23:53:30 | INFO | configs.config | Found GPU NVIDIA GeForce RTX 3090
is_half:True, device:cuda:0
2023-11-19 23:53:31 | INFO | __main__ | Use Language: en_US
Running on local URL: http://0.0.0.0:7865
2023-11-19 23:54:12 | INFO | httpx | HTTP Request: POST http://localhost:7865/api/predict "HTTP/1.1 200 OK"
2023-11-19 23:54:12 | INFO | __main__ | "/home/miniconda3/envs/cvr/bin/python" infer/modules/train/preprocess.py "/media/disk1_ssd/voc" 40000 22 "/media/disk1_ssd/Rvc/logs/trial" False 3.0
['infer/modules/train/preprocess.py', '/media/disk1_ssd/voc', '40000', '22', '/media/disk1_ssd/Rvc/logs/trial', 'False', '3.0']
start preprocess
['infer/modules/train/preprocess.py', '/media/disk1_ssd/voc', '40000', '22', '/media/disk1_ssd/Rvc/logs/trial', 'False', '3.0']
/media/disk1_ssd/voc/.DS_Store->Traceback (most recent call last):
File "/media/disk1_ssd/Rvc/infer/lib/audio.py", line 63, in load_audio
audio2(f, out, "f32le", sr)
File "/media/disk1_ssd/Rvc/infer/lib/audio.py", line 34, in audio2
inp = av.open(i, "rb")
File "av/container/core.pyx", line 401, in av.container.core.open
File "av/container/core.pyx", line 265, in av.container.core.Container.__cinit__
File "av/container/core.pyx", line 285, in av.container.core.Container.err_check
File "av/error.pyx", line 336, in av.error.err_check
av.error.InvalidDataError: [Errno 1094995529] Invalid data found when processing input: '/media/disk1_ssd/voc/.DS_Store'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "infer/modules/train/preprocess.py", line 87, in pipeline
audio = load_audio(path, self.sr)
File "/media/disk1_ssd/Rvc/infer/lib/audio.py", line 73, in load_audio
raise RuntimeError(traceback.format_exc())
RuntimeError: Traceback (most recent call last):
File "/media/disk1_ssd/Rvc/infer/lib/audio.py", line 63, in load_audio
audio2(f, out, "f32le", sr)
File "/media/disk1_ssd/Rvc/infer/lib/audio.py", line 34, in audio2
inp = av.open(i, "rb")
File "av/container/core.pyx", line 401, in av.container.core.open
File "av/container/core.pyx", line 265, in av.container.core.Container.__cinit__
File "av/container/core.pyx", line 285, in av.container.core.Container.err_check
File "av/error.pyx", line 336, in av.error.err_check
av.error.InvalidDataError: [Errno 1094995529] Invalid data found when processing input: '/media/disk1_ssd/voc/.DS_Store'
/media/disk1_ssd/voc/vocal_6.mp3_10.wav->Suc.
/media/disk1_ssd/voc/vocal_5.mp3_10.wav->Suc.
/media/disk1_ssd/voc/vocal_1.mp3_10.wav->Suc.
/media/disk1_ssd/voc/vocal_4.mp3_10.wav->Suc.
/media/disk1_ssd/voc/vocal_3.mp3_10.wav->Suc.
/media/disk1_ssd/voc/vocal_7.mp3_10.wav->Suc.
/media/disk1_ssd/voc/vocal_2.mp3_10.wav->Suc.
end preprocess
2023-11-19 23:54:18 | INFO | __main__ | start preprocess
['infer/modules/train/preprocess.py', '/media/disk1_ssd/voc', '40000', '22', '/media/disk1_ssd/Rvc/logs/trial', 'False', '3.0']
/media/disk1_ssd/voc/.DS_Store->Traceback (most recent call last):
File "/media/disk1_ssd/Rvc/infer/lib/audio.py", line 63, in load_audio
audio2(f, out, "f32le", sr)
File "/media/disk1_ssd/Rvc/infer/lib/audio.py", line 34, in audio2
inp = av.open(i, "rb")
File "av/container/core.pyx", line 401, in av.container.core.open
File "av/container/core.pyx", line 265, in av.container.core.Container.__cinit__
File "av/container/core.pyx", line 285, in av.container.core.Container.err_check
File "av/error.pyx", line 336, in av.error.err_check
av.error.InvalidDataError: [Errno 1094995529] Invalid data found when processing input: '/media/disk1_ssd/voc/.DS_Store'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "infer/modules/train/preprocess.py", line 87, in pipeline
audio = load_audio(path, self.sr)
File "/media/disk1_ssd/Rvc/infer/lib/audio.py", line 73, in load_audio
raise RuntimeError(traceback.format_exc())
RuntimeError: Traceback (most recent call last):
File "/media/disk1_ssd/Rvc/infer/lib/audio.py", line 63, in load_audio
audio2(f, out, "f32le", sr)
File "/media/disk1_ssd/Rvc/infer/lib/audio.py", line 34, in audio2
inp = av.open(i, "rb")
File "av/container/core.pyx", line 401, in av.container.core.open
File "av/container/core.pyx", line 265, in av.container.core.Container.__cinit__
File "av/container/core.pyx", line 285, in av.container.core.Container.err_check
File "av/error.pyx", line 336, in av.error.err_check
av.error.InvalidDataError: [Errno 1094995529] Invalid data found when processing input: '/media/disk1_ssd/voc/.DS_Store'
/media/disk1_ssd/voc/vocal_6.mp3_10.wav->Suc.
/media/disk1_ssd/voc/vocal_5.mp3_10.wav->Suc.
/media/disk1_ssd/voc/vocal_1.mp3_10.wav->Suc.
/media/disk1_ssd/voc/vocal_4.mp3_10.wav->Suc.
/media/disk1_ssd/voc/vocal_3.mp3_10.wav->Suc.
/media/disk1_ssd/voc/vocal_7.mp3_10.wav->Suc.
/media/disk1_ssd/voc/vocal_2.mp3_10.wav->Suc.
end preprocess
2023-11-19 23:54:18 | INFO | httpx | HTTP Request: POST http://localhost:7865/api/predict "HTTP/1.1 200 OK"
2023-11-19 23:54:18 | INFO | __main__ | "/home/miniconda3/envs/cvr/bin/python" infer/modules/train/extract/extract_f0_rmvpe.py 4 0 0 "/media/disk1_ssd/Rvc/logs/trial" True
2023-11-19 23:54:18 | INFO | __main__ | "/home/miniconda3/envs/cvr/bin/python" infer/modules/train/extract/extract_f0_rmvpe.py 4 1 1 "/media/disk1_ssd/Rvc/logs/trial" True
2023-11-19 23:54:18 | INFO | __main__ | "/home/miniconda3/envs/cvr/bin/python" infer/modules/train/extract/extract_f0_rmvpe.py 4 2 0 "/media/disk1_ssd/Rvc/logs/trial" True
2023-11-19 23:54:18 | INFO | __main__ | "/home/miniconda3/envs/cvr/bin/python" infer/modules/train/extract/extract_f0_rmvpe.py 4 3 1 "/media/disk1_ssd/Rvc/logs/trial" True
['infer/modules/train/extract/extract_f0_rmvpe.py', '4', '0', '0', '/media/disk1_ssd/Rvc/logs/trial', 'True']
todo-f0-82
f0ing,now-0,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/1_0.wav
['infer/modules/train/extract/extract_f0_rmvpe.py', '4', '3', '1', '/media/disk1_ssd/Rvc/logs/trial', 'True']
todo-f0-81
f0ing,now-0,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/1_12.wav
['infer/modules/train/extract/extract_f0_rmvpe.py', '4', '1', '1', '/media/disk1_ssd/Rvc/logs/trial', 'True']
todo-f0-82
f0ing,now-0,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/1_10.wav
['infer/modules/train/extract/extract_f0_rmvpe.py', '4', '2', '0', '/media/disk1_ssd/Rvc/logs/trial', 'True']
todo-f0-81
f0ing,now-0,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/1_11.wav
Loading rmvpe model
Loading rmvpe model
Loading rmvpe model
Loading rmvpe model
f0ing,now-16,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_15.wav
f0ing,now-16,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_17.wav
f0ing,now-32,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_95.wav
f0ing,now-16,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_16.wav
f0ing,now-16,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_148.wav
f0ing,now-32,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_98.wav
f0ing,now-48,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/4_19.wav
f0ing,now-32,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_94.wav
f0ing,now-32,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_96.wav
f0ing,now-48,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/4_20.wav
f0ing,now-64,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_10.wav
f0ing,now-48,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/4_18.wav
f0ing,now-64,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_12.wav
f0ing,now-48,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/4_2.wav
f0ing,now-80,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_90.wav
f0ing,now-64,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_1.wav
f0ing,now-64,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_11.wav
f0ing,now-80,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_92.wav
f0ing,now-80,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_9.wav
f0ing,now-80,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_91.wav
2023-11-19 23:54:26 | INFO | __main__ | ['infer/modules/train/extract/extract_f0_rmvpe.py', '4', '0', '0', '/media/disk1_ssd/Rvc/logs/trial', 'True']
todo-f0-82
f0ing,now-0,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/1_0.wav
['infer/modules/train/extract/extract_f0_rmvpe.py', '4', '3', '1', '/media/disk1_ssd/Rvc/logs/trial', 'True']
todo-f0-81
f0ing,now-0,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/1_12.wav
['infer/modules/train/extract/extract_f0_rmvpe.py', '4', '1', '1', '/media/disk1_ssd/Rvc/logs/trial', 'True']
todo-f0-82
f0ing,now-0,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/1_10.wav
['infer/modules/train/extract/extract_f0_rmvpe.py', '4', '2', '0', '/media/disk1_ssd/Rvc/logs/trial', 'True']
todo-f0-81
f0ing,now-0,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/1_11.wav
f0ing,now-16,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_15.wav
f0ing,now-16,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_17.wav
f0ing,now-32,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_95.wav
f0ing,now-16,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_16.wav
f0ing,now-16,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_148.wav
f0ing,now-32,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_98.wav
f0ing,now-48,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/4_19.wav
f0ing,now-32,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_94.wav
f0ing,now-32,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_96.wav
f0ing,now-48,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/4_20.wav
f0ing,now-64,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_10.wav
f0ing,now-48,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/4_18.wav
f0ing,now-64,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_12.wav
f0ing,now-48,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/4_2.wav
f0ing,now-80,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_90.wav
f0ing,now-64,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_1.wav
f0ing,now-64,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_11.wav
f0ing,now-80,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_92.wav
f0ing,now-80,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_9.wav
f0ing,now-80,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_91.wav
2023-11-19 23:54:26 | INFO | __main__ | "/home/miniconda3/envs/cvr/bin/python" infer/modules/train/extract_feature_print.py cuda:0 1 0 0 "/media/disk1_ssd/Rvc/logs/trial" v2
['infer/modules/train/extract_feature_print.py', 'cuda:0', '1', '0', '0', '/media/disk1_ssd/Rvc/logs/trial', 'v2']
/media/disk1_ssd/Rvc/logs/trial
load model(s) from assets/hubert/hubert_base.pt
2023-11-19 23:54:26 | INFO | fairseq.tasks.hubert_pretraining | current directory is /media/disk1_ssd/Rvc
2023-11-19 23:54:26 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2023-11-19 23:54:26 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
move model to cuda
all-feature-326
now-326,all-0,1_0.wav,(149, 768)
now-326,all-32,2_11.wav,(155, 768)
now-326,all-64,2_148.wav,(134, 768)
now-326,all-96,2_56.wav,(105, 768)
now-326,all-128,2_94.wav,(149, 768)
now-326,all-160,3_42.wav,(149, 768)
now-326,all-192,4_18.wav,(149, 768)
now-326,all-224,5_16.wav,(149, 768)
now-326,all-256,7_1.wav,(149, 768)
now-326,all-288,7_50.wav,(113, 768)
now-326,all-320,7_9.wav,(140, 768)
all-feature-done
2023-11-19 23:54:33 | INFO | __main__ | ['infer/modules/train/extract/extract_f0_rmvpe.py', '4', '0', '0', '/media/disk1_ssd/Rvc/logs/trial', 'True']
todo-f0-82
f0ing,now-0,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/1_0.wav
['infer/modules/train/extract/extract_f0_rmvpe.py', '4', '3', '1', '/media/disk1_ssd/Rvc/logs/trial', 'True']
todo-f0-81
f0ing,now-0,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/1_12.wav
['infer/modules/train/extract/extract_f0_rmvpe.py', '4', '1', '1', '/media/disk1_ssd/Rvc/logs/trial', 'True']
todo-f0-82
f0ing,now-0,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/1_10.wav
['infer/modules/train/extract/extract_f0_rmvpe.py', '4', '2', '0', '/media/disk1_ssd/Rvc/logs/trial', 'True']
todo-f0-81
f0ing,now-0,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/1_11.wav
f0ing,now-16,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_15.wav
f0ing,now-16,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_17.wav
f0ing,now-32,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_95.wav
f0ing,now-16,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_16.wav
f0ing,now-16,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_148.wav
f0ing,now-32,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_98.wav
f0ing,now-48,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/4_19.wav
f0ing,now-32,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_94.wav
f0ing,now-32,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/2_96.wav
f0ing,now-48,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/4_20.wav
f0ing,now-64,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_10.wav
f0ing,now-48,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/4_18.wav
f0ing,now-64,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_12.wav
f0ing,now-48,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/4_2.wav
f0ing,now-80,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_90.wav
f0ing,now-64,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_1.wav
f0ing,now-64,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_11.wav
f0ing,now-80,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_92.wav
f0ing,now-80,all-82,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_9.wav
f0ing,now-80,all-81,-/media/disk1_ssd/Rvc/logs/trial/1_16k_wavs/7_91.wav
['infer/modules/train/extract_feature_print.py', 'cuda:0', '1', '0', '0', '/media/disk1_ssd/Rvc/logs/trial', 'v2']
/media/disk1_ssd/Rvc/logs/trial
load model(s) from assets/hubert/hubert_base.pt
move model to cuda
all-feature-326
now-326,all-0,1_0.wav,(149, 768)
now-326,all-32,2_11.wav,(155, 768)
now-326,all-64,2_148.wav,(134, 768)
now-326,all-96,2_56.wav,(105, 768)
now-326,all-128,2_94.wav,(149, 768)
now-326,all-160,3_42.wav,(149, 768)
now-326,all-192,4_18.wav,(149, 768)
now-326,all-224,5_16.wav,(149, 768)
now-326,all-256,7_1.wav,(149, 768)
now-326,all-288,7_50.wav,(113, 768)
now-326,all-320,7_9.wav,(140, 768)
all-feature-done
2023-11-19 23:54:33 | INFO | httpx | HTTP Request: POST http://localhost:7865/api/predict "HTTP/1.1 200 OK"
2023-11-19 23:54:33 | INFO | __main__ | Use gpus: 0
2023-11-19 23:54:33 | INFO | __main__ | "/home/miniconda3/envs/cvr/bin/python" infer/modules/train/train.py -e "trial" -sr 40k -f0 1 -bs 12 -g 0 -te 20 -se 5 -pg assets/pretrained_v2/f0G40k.pth -pd assets/pretrained_v2/f0D40k.pth -l 0 -c 0 -sw 0 -v v2
INFO:trial:{'data': {'filter_length': 2048, 'hop_length': 400, 'max_wav_value': 32768.0, 'mel_fmax': None, 'mel_fmin': 0.0, 'n_mel_channels': 125, 'sampling_rate': 40000, 'win_length': 2048, 'training_files': './logs/trial/filelist.txt'}, 'model': {'filter_channels': 768, 'gin_channels': 256, 'hidden_channels': 192, 'inter_channels': 192, 'kernel_size': 3, 'n_heads': 2, 'n_layers': 6, 'p_dropout': 0, 'resblock': '1', 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'resblock_kernel_sizes': [3, 7, 11], 'spk_embed_dim': 109, 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'upsample_rates': [10, 10, 2, 2], 'use_spectral_norm': False}, 'train': {'batch_size': 12, 'betas': [0.8, 0.99], 'c_kl': 1.0, 'c_mel': 45, 'epochs': 20000, 'eps': 1e-09, 'fp16_run': False, 'init_lr_ratio': 1, 'learning_rate': 0.0001, 'log_interval': 200, 'lr_decay': 0.999875, 'seed': 1234, 'segment_size': 12800, 'warmup_epochs': 0}, 'model_dir': './logs/trial', 'experiment_dir': './logs/trial', 'save_every_epoch': 5, 'name': 'trial', 'total_epoch': 20, 'pretrainG': 'assets/pretrained_v2/f0G40k.pth', 'pretrainD': 'assets/pretrained_v2/f0D40k.pth', 'version': 'v2', 'gpus': '0', 'sample_rate': '40k', 'if_f0': 1, 'if_latest': 0, 'save_every_weights': '0', 'if_cache_data_in_gpu': 0}
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
Process Process-2:
Traceback (most recent call last):
File "/home/miniconda3/envs/cvr/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
self.run()
File "/home/miniconda3/envs/cvr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/media/disk1_ssd/Rvc/infer/modules/train/train.py", line 137, in run
torch.cuda.set_device(rank)
File "/home/miniconda3/envs/cvr/lib/python3.8/site-packages/torch/cuda/__init__.py", line 314, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
DEBUG:infer.lib.infer_pack.models:gin_channels: 256, self.spk_embed_dim: 109
Process Process-1:
Traceback (most recent call last):
File "/home/miniconda3/envs/cvr/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
self.run()
File "/home/miniconda3/envs/cvr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/media/disk1_ssd/Rvc/infer/modules/train/train.py", line 205, in run
net_g = DDP(net_g, device_ids=[rank])
File "/home/miniconda3/envs/cvr/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in __init__
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/miniconda3/envs/cvr/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [fca1:de0c:6dd9:56e9:448a::1]:21721
My solution: use one GPU to train for one or more epoches to create a checkpoint. Then continue training on multi-GPUs. It seems that RVC can train on multi-GPUs by loading an existed checkpoint.
I have the same problem. Have you fixed this problem?
I also have the same problem, trying to solve
I have solved this problem and I think adding “ CUDA_LAUNCH_BLOCKING=1” works.
I also have the same problem, trying to solve
Using the latest TAG can solve the problem
same error for me, did anyone find a solution ?
我也有同样的问题,正在尝试解决
使用最新的TAG即可解决问题
我用了10月份的tar包,还是遇到了这个问题,并且他并不支持多卡训练IndexError: list index out of range
This issue was closed because it has been inactive for 15 days since being marked as stale.