Retrieval-based-Voice-Conversion-WebUI [IPEX] RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half

After updating the needed packages to detect my GPU, I ran into a new problem with voice inferrance. I am encountering this error after specifying a path to a file I want to use, but errors out with the traceback shown below.

2024-02-17 20:36:17 | WARNING | infer.modules.vc.modules | Traceback (most recent call last):
  File "/home/owner/rvc/infer/modules/vc/modules.py", line 186, in vc_single
    audio_opt = self.pipeline.pipeline(
  File "/home/owner/rvc/infer/modules/vc/pipeline.py", line 410, in pipeline
    self.vc(
  File "/home/owner/rvc/infer/modules/vc/pipeline.py", line 219, in vc
    logits = model.extract_features(**inputs)
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/fairseq/models/hubert/hubert.py", line 535, in extract_features
    res = self.forward(
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/fairseq/models/hubert/hubert.py", line 467, in forward
    x, _ = self.encoder(
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/fairseq/models/wav2vec/wav2vec2.py", line 1003, in forward
    x, layer_results = self.extract_features(x, padding_mask, layer)
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/fairseq/models/wav2vec/wav2vec2.py", line 1049, in extract_features
    x, (z, lr) = layer(
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/fairseq/models/wav2vec/wav2vec2.py", line 1260, in forward
    x, attn = self.self_attn(
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/fairseq/modules/multihead_attention.py", line 538, in forward
    return F.multi_head_attention_forward(
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/torch/nn/functional.py", line 5443, in multi_head_attention_forward
    attn_output = linear(attn_output, out_proj_weight, out_proj_bias)
RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half

Traceback (most recent call last):
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/gradio/routes.py", line 437, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/gradio/blocks.py", line 1349, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/gradio/blocks.py", line 1283, in postprocess_data
    prediction_value = block.postprocess(prediction_value)
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/gradio/components.py", line 2586, in postprocess
    file_path = self.audio_to_temp_file(
  File "/home/owner/rvc/.venv/lib/python3.10/site-packages/gradio/components.py", line 360, in audio_to_temp_file
    temp_dir = Path(dir) / self.hash_bytes(data.tobytes())
AttributeError: 'NoneType' object has no attribute 'tobytes'
2024-02-17 20:36:17 | INFO | httpx | HTTP Request: POST http://localhost:7865/api/predict "HTTP/1.1 500 Internal Server Error"
2024-02-17 20:36:17 | INFO | httpx | HTTP Request: POST http://localhost:7865/reset "HTTP/1.1 200 OK"

Really hoping it's as simple as a package needing to be updated, or if i'm not the only one with this problem. This problem occurs in both Windows and Linux. Thank you in advance

Feb 18 '24 04:02 7rident

I have the same problem. Have you solved it?

Feb 22 '24 09:02 xiaolibuzai-ovo

okay, I refresh the page, and then the problem disappeared...

Feb 22 '24 12:02 xiaolibuzai-ovo

Audio path or pitch are not provided

Feb 22 '24 14:02 Abedalhkeem-z

Audio path or pitch are not provided

The path is valid, and is the same method used before this issue happened. Only thing I can think of is the Intel binaries had to be updated because the GPU wouldn't be detected on the old versions.

Feb 23 '24 19:02 7rident

Hoping there's a solution for this issue, since even on the dev branch it is throwing the same errors. I think the problem is stemming from the error RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half

Mar 07 '24 19:03 7rident

hmm, did you install the libraries from the requirements?

Mar 07 '24 20:03 Abedalhkeem-z

hmm, did you install the libraries from the requirements?

If by that you mean the requirements, than yes, with intel_extension_for_pytorch, torch, torchaudio, and torchvision being updated to their respective latest versions from intel's XPU repository. Not doing such would result in the GPU not being detected

Mar 07 '24 21:03 7rident

Closing due to skill issue on my part (involving packages) and Arch not updating their packages when they really should. Thank you for the help regardless

Mar 21 '24 06:03 7rident

No clue how I got it working before (on linux), but the issue came up again. Originally, I was looking at the wrong thing but the real issue is the dtype coming up as float and half. Included is the log of what happens. Devs, I got this working on Windows via edited pip packages with DLLs to make it work.

` 2024-04-29 20:16:33 | INFO | configs.config | Found GPU Intel(R) Arc(TM) A750 Graphics 2024-04-29 20:16:33 | INFO | configs.config | Use xpu:0 instead 2024-04-29 20:16:33 | INFO | configs.config | Half-precision floating-point: True, device: xpu:0 2024-04-29 20:16:38 | INFO | httpx | HTTP Request: GET https://api.gradio.app/gradio-messaging/en "HTTP/1.1 200 OK" 2024-04-29 20:16:49 | INFO | infer.lib.rvcmd | checking hubret & rmvpe... 2024-04-29 20:16:53 | INFO | infer.lib.rvcmd | checking pretrained models... 2024-04-29 20:17:00 | INFO | infer.lib.rvcmd | checking pretrained models v2... 2024-04-29 20:17:08 | INFO | infer.lib.rvcmd | checking uvr5_weights... 2024-04-29 20:17:12 | INFO | infer.lib.rvcmd | all assets are already latest. 2024-04-29 20:17:12 | INFO | main | Use Language: en_US Running on local URL: http://0.0.0.0:7865 2024-04-29 20:17:22 | INFO | infer.modules.vc.modules | Get sid: Harley.pth 2024-04-29 20:17:22 | INFO | infer.modules.vc.modules | Loading: assets/weights/Harley.pth 2024-04-29 20:17:25 | INFO | infer.modules.vc.modules | Select index: logs\Harley\added_IVF3130_Flat_nprobe_1_Harley_v2.index 2024-04-29 20:17:48 | INFO | fairseq.tasks.hubert_pretraining | current directory is F:\rvc 2024-04-29 20:17:48 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False} 2024-04-29 20:17:48 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False} 2024-04-29 20:17:52 | INFO | infer.modules.vc.pipeline | Loading rmvpe model,assets/rmvpe/rmvpe.pt FFT WARNING: INPUT_STRIDES and OUTPUT_STRIDES are deprecated: please use FWD_STRIDES and BWD_STRIDES, instead.

FFT WARNING: INPUT_STRIDES and OUTPUT_STRIDES are deprecated: please use FWD_STRIDES and BWD_STRIDES, instead.

2024-04-29 20:18:18 | WARNING | infer.modules.vc.modules | Traceback (most recent call last): File "F:\rvc\infer\modules\vc\modules.py", line 188, in vc_single audio_opt = self.pipeline.pipeline( File "F:\rvc\infer\modules\vc\pipeline.py", line 428, in pipeline self.vc( File "F:\rvc\infer\modules\vc\pipeline.py", line 237, in vc logits = model.extract_features(**inputs) File "F:\rvc.venv\lib\site-packages\fairseq\models\hubert\hubert.py", line 535, in extract_features res = self.forward( File "F:\rvc.venv\lib\site-packages\fairseq\models\hubert\hubert.py", line 467, in forward x, _ = self.encoder( File "F:\rvc.venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "F:\rvc.venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "F:\rvc.venv\lib\site-packages\fairseq\models\wav2vec\wav2vec2.py", line 1003, in forward x, layer_results = self.extract_features(x, padding_mask, layer) File "F:\rvc.venv\lib\site-packages\fairseq\models\wav2vec\wav2vec2.py", line 1049, in extract_features x, (z, lr) = layer( File "F:\rvc.venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "F:\rvc.venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "F:\rvc.venv\lib\site-packages\fairseq\models\wav2vec\wav2vec2.py", line 1260, in forward x, attn = self.self_attn( File "F:\rvc.venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "F:\rvc.venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "F:\rvc.venv\lib\site-packages\fairseq\modules\multihead_attention.py", line 538, in forward return F.multi_head_attention_forward( File "F:\rvc.venv\lib\site-packages\torch\nn\functional.py", line 5443, in multi_head_attention_forward attn_output = linear(attn_output, out_proj_weight, out_proj_bias) RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half `

I am sorry for opening this up, and hopefully someone can get me pointed in the right direction. Thank you again

Apr 30 '24 00:04 7rident

I'm having the exact same issue with a WSL build (incidentally if you have any advice on how to do the witchcraft to get it to run on Windows by itself, I'd love to hear it).

Apr 30 '24 21:04 LunaticBlood

I am having the same issue on Linux Mint 22.04 and Intel Arc A770. Any help from the dev would be appreciated! @RVC-Boss

Jun 28 '24 15:06 LovelyA72

Also, GitHub bot adds stale and closing the issue adds nothing but frustration.

Jun 28 '24 15:06 LovelyA72

I am having the same issue on Linux Mint 22.04 and Intel Arc A770. Any help from the dev would be appreciated! @RVC-Boss

in your rvc directory open file infer/lib/infer_pack/models.py, go to line 382 (should be about the same, without the .to('cpu) at the end) and change the line to tmp_over_one.transpose(2, 1).to('cpu') A few lines down (line 386 to be exact), edit the line so it is ).transpose(2, 1).to(tmp_over_one.device) KEEP ALL INDENTATIONS INTACT

This is what worked for me, and I thank @a-One-Fan for this information.

Jun 29 '24 03:06 7rident

Because I forgot to do so, the post I made solved the problem I had originally. I am closing this issue as a result, and hopefully it helps some others too

Jul 01 '24 17:07 7rident

Retrieval-based-Voice-Conversion-WebUI Retrieval-based-Voice-Conversion-WebUI copied to clipboard

[IPEX] RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half

Retrieval-based-Voice-Conversion-WebUI
Retrieval-based-Voice-Conversion-WebUI copied to clipboard