Retrieval-based-Voice-Conversion-WebUI icon indicating copy to clipboard operation
Retrieval-based-Voice-Conversion-WebUI copied to clipboard

48k v2 training is not supported yet on macOS

Open Naozumi520 opened this issue 2 years ago • 3 comments

INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
gin_channels: 256 self.spk_embed_dim: 109
INFO:UD:loaded pretrained pretrained_v2/f0G48k.pth
<All keys matched successfully>
INFO:UD:loaded pretrained pretrained_v2/f0D48k.pth
<All keys matched successfully>
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py:120: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  warnings.warn("torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.")
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/SpectralOps.cpp:867.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/SpectralOps.cpp:867.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/SpectralOps.cpp:867.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/SpectralOps.cpp:867.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/amp/autocast_mode.py:204: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
Process Process-1:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/naozumi/Downloads/RVC/Retrieval-based-Voice-Conversion-WebUI-updated0618v2/train_nsf_sim_cache_sid_load_pretrain.py", line 223, in run
    train_and_evaluate(
  File "/Users/naozumi/Downloads/RVC/Retrieval-based-Voice-Conversion-WebUI-updated0618v2/train_nsf_sim_cache_sid_load_pretrain.py", line 392, in train_and_evaluate
    ) = net_g(phone, phone_lengths, pitch, pitchf, spec, spec_lengths, sid)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
    output = self._run_ddp_forward(*inputs, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1113, in _run_ddp_forward
    return module_to_run(*inputs, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/naozumi/Downloads/RVC/Retrieval-based-Voice-Conversion-WebUI-updated0618v2/lib/infer_pack/models.py", line 747, in forward
    o = self.dec(z_slice, pitchf, g=g)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/naozumi/Downloads/RVC/Retrieval-based-Voice-Conversion-WebUI-updated0618v2/lib/infer_pack/models.py", line 495, in forward
    har_source, noi_source, uv = self.m_source(f0, self.upp)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/naozumi/Downloads/RVC/Retrieval-based-Voice-Conversion-WebUI-updated0618v2/lib/infer_pack/models.py", line 418, in forward
    sine_merge = self.l_tanh(self.l_linear(sine_wavs))
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 must have the same dtype

Naozumi520 avatar Jul 12 '23 08:07 Naozumi520

What is the model of your Mac computer? Do you have an Nvidia graphics card?

RVC-Boss avatar Jul 12 '23 14:07 RVC-Boss

MacBook Pro 16 2019. I don't have Nvidia graphic card. Instead, I got AMD 5500M. Training using 40k V1/V2, 48k V1 is working using MPS.

Naozumi520 avatar Jul 12 '23 15:07 Naozumi520

On M1 Mac, I was able to train 48k V2 when I commented out the following lines on models.py

        if self.is_half:
            sine_wavs = sine_wavs.half()

Naturalclar avatar Jul 14 '23 02:07 Naturalclar

File "/Users/naozumi/Downloads/RVC/Retrieval-based-Voice-Conversion-WebUI-updated0618v2/lib/infer_pack/models.py", line 418, in forward sine_merge = self.l_tanh(self.l_linear(sine_wavs))

you can also change this line to

sine_merge = self.l_tanh(self.l_linear(sine_wavs.to(self.l_linear.weight.dtype)))

to fix it

RVC-Boss avatar Aug 14 '23 06:08 RVC-Boss