Retrieval-based-Voice-Conversion-WebUI torch.cuda.OutOfMemoryError: CUDA out of memory

显卡为RTX3060，12GB显存，batch_size设置为默认的6，爆显存了。之后把batch_size调为1依然不行，报错类似。
(rvc) F:\AI\rvc\Retrieval-based-Voice-Conversion-WebUI>python infer-web.py
Use Language: zh_CN
Running on local URL:  http://0.0.0.0:7865
start preprocess
['trainset_preprocess_pipeline_print.py', 'F:\\AI\\ganyu_wav', '40000', '8', 'F:\\AI\\rvc\\Retrieval-based-Voice-Conversion-WebUI/logs/ganyu', 'False']
此处预处理省略
end preprocess
start preprocess
['trainset_preprocess_pipeline_print.py', 'F:\\AI\\ganyu_wav', '40000', '8', 'F:\\AI\\rvc\\Retrieval-based-Voice-Conversion-WebUI/logs/ganyu', 'False']
此处预处理省略
end preprocess

['extract_feature_print.py', 'cuda:0', '1', '0', '0', 'F:\\AI\\rvc\\Retrieval-based-Voice-Conversion-WebUI/logs/ganyu', 'v1']
F:\AI\rvc\Retrieval-based-Voice-Conversion-WebUI/logs/ganyu
load model(s) from hubert_base.pt
2023-07-12 20:27:00 | INFO | fairseq.tasks.hubert_pretraining | current directory is F:\AI\rvc\Retrieval-based-Voice-Conversion-WebUI
2023-07-12 20:27:00 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2023-07-12 20:27:00 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
move model to cuda
all-feature-823
now-823,all-820,99_2.wav,(95, 256)
all-feature-done
['extract_feature_print.py', 'cuda:0', '1', '0', '0', 'F:\\AI\\rvc\\Retrieval-based-Voice-Conversion-WebUI/logs/ganyu', 'v1']
F:\AI\rvc\Retrieval-based-Voice-Conversion-WebUI/logs/ganyu
load model(s) from hubert_base.pt
move model to cuda
all-feature-823
now-823,all-820,99_2.wav,(95, 256)
all-feature-done

INFO:ganyu:{'train': {'log_interval': 200, 'seed': 1234, 'epochs': 20000, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 6, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 12800, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'max_wav_value': 32768.0, 'sampling_rate': 40000, 'filter_length': 2048, 'hop_length': 400, 'win_length': 2048, 'n_mel_channels': 125, 'mel_fmin': 0.0, 'mel_fmax': None, 'training_files': './logs\\ganyu/filelist.txt'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 10, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'use_spectral_norm': False, 'gin_channels': 256, 'spk_embed_dim': 109}, 'model_dir': './logs\\ganyu', 'experiment_dir': './logs\\ganyu', 'save_every_epoch': 5, 'name': 'ganyu', 'total_epoch': 20, 'pretrainG': 'pretrained/G40k.pth', 'pretrainD': 'pretrained/D40k.pth', 'version': 'v1', 'gpus': '0', 'sample_rate': '40k', 'if_f0': 0, 'if_latest': 0, 'save_every_weights': '0', 'if_cache_data_in_gpu': 0}
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
gin_channels: 256 self.spk_embed_dim: 109
INFO:ganyu:loaded pretrained pretrained/G40k.pth
<All keys matched successfully>
INFO:ganyu:loaded pretrained pretrained/D40k.pth
<All keys matched successfully>
D:\miniconda3\envs\rvc\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ..\aten\src\ATen\native\SpectralOps.cpp:867.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
D:\miniconda3\envs\rvc\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ..\aten\src\ATen\native\SpectralOps.cpp:867.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
D:\miniconda3\envs\rvc\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ..\aten\src\ATen\native\SpectralOps.cpp:867.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
D:\miniconda3\envs\rvc\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ..\aten\src\ATen\native\SpectralOps.cpp:867.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
D:\miniconda3\envs\rvc\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ..\aten\src\ATen\native\SpectralOps.cpp:867.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
Process Process-1:
Traceback (most recent call last):
  File "D:\miniconda3\envs\rvc\lib\multiprocessing\process.py", line 314, in _bootstrap
    self.run()
  File "D:\miniconda3\envs\rvc\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "F:\AI\rvc\Retrieval-based-Voice-Conversion-WebUI\train_nsf_sim_cache_sid_load_pretrain.py", line 223, in run
    train_and_evaluate(
  File "F:\AI\rvc\Retrieval-based-Voice-Conversion-WebUI\train_nsf_sim_cache_sid_load_pretrain.py", line 430, in train_and_evaluate
    y_d_hat_r, y_d_hat_g, _, _ = net_d(wave, y_hat.detach())
  File "D:\miniconda3\envs\rvc\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\miniconda3\envs\rvc\lib\site-packages\torch\nn\parallel\distributed.py", line 1156, in forward
    output = self._run_ddp_forward(*inputs, **kwargs)
  File "D:\miniconda3\envs\rvc\lib\site-packages\torch\nn\parallel\distributed.py", line 1110, in _run_ddp_forward
    return module_to_run(*inputs[0], **kwargs[0])  # type: ignore[index]
  File "D:\miniconda3\envs\rvc\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\AI\rvc\Retrieval-based-Voice-Conversion-WebUI\infer_pack\models.py", line 976, in forward
    y_d_r, fmap_r = d(y)
  File "D:\miniconda3\envs\rvc\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\AI\rvc\Retrieval-based-Voice-Conversion-WebUI\infer_pack\models.py", line 1117, in forward
    x = l(x)
  File "D:\miniconda3\envs\rvc\lib\site-packages\torch\nn\modules\module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "D:\miniconda3\envs\rvc\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "D:\miniconda3\envs\rvc\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 12.00 GiB total capacity; 1.84 GiB already allocated; 9.05 GiB free; 1.89 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Jul 12 '23 12:07 GCVillager
我的也是爆显存，就是WSL ubuntu部署会这样，但Windows下就没啥问题
Jul 12 '23 14:07 hpx502766238
降级了cuda的版本，已经解决。
Jul 14 '23 08:07 GCVillager