Retrieval-based-Voice-Conversion-WebUI icon indicating copy to clipboard operation
Retrieval-based-Voice-Conversion-WebUI copied to clipboard

Training is now compleletly unusable

Open allroundHim opened this issue 4 months ago • 0 comments

My training has a strange behavior, it can only produce index file, but not model file. When hiting train model button,

OS: Win10 GPU: NVIDIA 4080S

It prints this

Traceback (most recent call last):
  File "multiprocessing\process.py", line 315, in _bootstrap
  File "multiprocessing\process.py", line 108, in run
  File "C:\RVC\infer\modules\train\train.py", line 233, in run
    net_g.module.load_state_dict(
  File "C:\RVC\runtime\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SynthesizerTrnMs768NSFsid:
        size mismatch for dec.ups.0.weight_v: copying a param with shape torch.Size([512, 256, 24]) from checkpoint, the shape in current model is torch.Size([512, 256, 16]).
        size mismatch for dec.ups.1.weight_v: copying a param with shape torch.Size([256, 128, 20]) from checkpoint, the shape in current model is torch.Size([256, 128, 16]).
2024-09-28 20:26:31 | INFO | __main__ | Use gpus: 0
2024-09-28 20:26:31 | INFO | __main__ | "runtime\python.exe" infer/modules/train/train.py -e "test_project" -sr 40k -f0 1 -bs 8 -g 0 -te 20 -se 5 -pg assets/pretrained_v2/f0G48k.pth -pd assets/pretrained_v2/f0D48k.pth -l 0 -c 1 -sw 0 -v v2
INFO:test_project:{'data': {'filter_length': 2048, 'hop_length': 400, 'max_wav_value': 32768.0, 'mel_fmax': None, 'mel_fmin': 0.0, 'n_mel_channels': 125, 'sampling_rate': 40000, 'win_length': 2048, 'training_files': './logs\\test_project/filelist.txt'}, 'model': {'filter_channels': 768, 'gin_channels': 256, 'hidden_channels': 192, 'inter_channels': 192, 'kernel_size': 3, 'n_heads': 2, 'n_layers': 6, 'p_dropout': 0, 'resblock': '1', 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'resblock_kernel_sizes': [3, 7, 11], 'spk_embed_dim': 109, 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'upsample_rates': [10, 10, 2, 2], 'use_spectral_norm': False}, 'train': {'batch_size': 8, 'betas': [0.8, 0.99], 'c_kl': 1.0, 'c_mel': 45, 'epochs': 20000, 'eps': 1e-09, 'fp16_run': True, 'init_lr_ratio': 1, 'learning_rate': 0.0001, 'log_interval': 200, 'lr_decay': 0.999875, 'seed': 1234, 'segment_size': 12800, 'warmup_epochs': 0}, 'model_dir': './logs\\test_project', 'experiment_dir': './logs\\test_project', 'save_every_epoch': 5, 'name': 'test_project', 'total_epoch': 20, 'pretrainG': 'assets/pretrained_v2/f0G48k.pth', 'pretrainD': 'assets/pretrained_v2/f0D48k.pth', 'version': 'v2', 'gpus': '0', 'sample_rate': '40k', 'if_f0': 1, 'if_latest': 0, 'save_every_weights': '0', 'if_cache_data_in_gpu': 1}
[W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [mytestserver.com]:37197 (system error: 10049 - 在其上 下文中,该请求的地址无效。).
[W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [mytestserver.com]:37197 (system error: 10049 - 在其上 下文中,该请求的地址无效。).

It is rediculous, it even says it is refering to "C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp" I even does not have a folder called actions-runner at C drive And rediculously, the address it is connecting to is my custom sample server.

allroundHim avatar Sep 28 '24 12:09 allroundHim