Retrieval-based-Voice-Conversion-WebUI icon indicating copy to clipboard operation
Retrieval-based-Voice-Conversion-WebUI copied to clipboard

关于intel显卡训练失败, 成功生成索引, 但weight文件夹下找不到pth

Open laoshengji opened this issue 1 year ago • 2 comments

问题描述: 使用intel Arc 770显卡进行训练, 显示全流程结束, 成功构建索引. logs文件夹能找到add开头的index索引, 但是weight文件夹下没有pth文件. 命令行窗口过程报错如下:

f0fail-14-C:\software\RVC1006AMD_Intel1/logs/pyxpu/1_16k_wavs/0_124.wav-Traceback (most recent call last):
  File "infer/modules/train/extract/extract_f0_rmvpe.py", line 89, in go
    featur_pit = self.compute_f0(inp_path, f0_method)
  File "infer/modules/train/extract/extract_f0_rmvpe.py", line 52, in compute_f0
    self.model_rmvpe = RMVPE(
  File "C:\software\RVC1006AMD_Intel1\infer\lib\rmvpe.py", line 503, in __init__
    self.mel_extractor = MelSpectrogram(
  File "C:\ProgramData\miniconda3\envs\tor\lib\site-packages\torch\nn\modules\module.py", line 1173, in to
    return self._apply(convert)
  File "C:\ProgramData\miniconda3\envs\tor\lib\site-packages\torch\nn\modules\module.py", line 853, in _apply
    self._buffers[key] = fn(buf)
  File "C:\ProgramData\miniconda3\envs\tor\lib\site-packages\torch\nn\modules\module.py", line 1159, in convert
    return t.to(
  File "C:\ProgramData\miniconda3\envs\tor\lib\site-packages\torch\cuda\__init__.py", line 284, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
C:\ProgramData\miniconda3\envs\tor\lib\site-packages\torch\nn\utils\weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
DEBUG:infer.lib.infer_pack.models:gin_channels: 256, self.spk_embed_dim: 109
Process Process-1:
Traceback (most recent call last):
  File "C:\ProgramData\miniconda3\envs\tor\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "C:\ProgramData\miniconda3\envs\tor\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\software\RVC1006AMD_Intel1\infer\modules\train\train.py", line 184, in run
    net_g = net_g.cuda(rank)
  File "C:\ProgramData\miniconda3\envs\tor\lib\site-packages\torch\nn\modules\module.py", line 915, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "C:\ProgramData\miniconda3\envs\tor\lib\site-packages\torch\nn\modules\module.py", line 779, in _apply
    module._apply(fn)
  File "C:\ProgramData\miniconda3\envs\tor\lib\site-packages\torch\nn\modules\module.py", line 779, in _apply
    module._apply(fn)
  File "C:\ProgramData\miniconda3\envs\tor\lib\site-packages\torch\nn\modules\module.py", line 804, in _apply
    param_applied = fn(param)
  File "C:\ProgramData\miniconda3\envs\tor\lib\site-packages\torch\nn\modules\module.py", line 915, in <lambda>
    return self._apply(lambda t: t.cuda(device))
RuntimeError: Invalid device, must be cuda device

系统配置 处理器 13th Gen Intel(R) Core(TM) i7-13700K 3.40 GHz 显卡 Intel(R) Arc(TM) A770 Graphics 操作系统 Windows 11 专业版 23H2 rvc代码版本 2.2.231006

安装过程

  1. intel显卡驱动已升级到最新32.0.101.6078, 版本: 2024/9/13
  2. 已安装oneAPI, 并在oneAPI环境中运行训练,
  3. visual studio build tools已添加到系统变量环境
conda create -n tor python==3.8.0
conda activate tor
python -m pip install pip==22.3.1
pip cache purge
conda install pkg-config libuv
pip install torch==2.3.1+cxx11.abi torchvision==0.18.1+cxx11.abi torchaudio==2.3.1+cxx11.abi intel-extension-for-pytorch==2.3.110+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
pip install -r requirements-dml.txt
pip install --force-reinstall -v "av==11.0.0"
pip install gradio==3.48.0
conda install -c anaconda libjpeg-turbo libpng
pip install transformers

我的疑惑 看到报错是关于cuda, 而intel显卡使用的api是xpu. 找到代码里有关于替换cuda为xpu的ipex模块, 所以应当能够支持intel显卡进行训练, 理想情况下所有cuda均被替换为xpu 但是为什么报错的内容, 提示仅支持cuda设备? 我想弄明白, 究竟是intel显卡设备不支持训练, 还是代码没有将所有cuda替换干净? 亦或是我安装的torch版本有问题?

laoshengji avatar Oct 29 '24 23:10 laoshengji

你也许可以试试用更新的版本

Sanguo1Caili avatar Jan 12 '25 00:01 Sanguo1Caili

你的a770能被识别到吗?我跟着步骤部署,识别不到我的a750

first-tarkie avatar Mar 08 '25 13:03 first-tarkie