Retrieval-based-Voice-Conversion-WebUI
Retrieval-based-Voice-Conversion-WebUI copied to clipboard
[IPEX] RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half
After updating the needed packages to detect my GPU, I ran into a new problem with voice inferrance. I am encountering this error after specifying a path to a file I want to use, but errors out with the traceback shown below.
2024-02-17 20:36:17 | WARNING | infer.modules.vc.modules | Traceback (most recent call last):
File "/home/owner/rvc/infer/modules/vc/modules.py", line 186, in vc_single
audio_opt = self.pipeline.pipeline(
File "/home/owner/rvc/infer/modules/vc/pipeline.py", line 410, in pipeline
self.vc(
File "/home/owner/rvc/infer/modules/vc/pipeline.py", line 219, in vc
logits = model.extract_features(**inputs)
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/fairseq/models/hubert/hubert.py", line 535, in extract_features
res = self.forward(
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/fairseq/models/hubert/hubert.py", line 467, in forward
x, _ = self.encoder(
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/fairseq/models/wav2vec/wav2vec2.py", line 1003, in forward
x, layer_results = self.extract_features(x, padding_mask, layer)
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/fairseq/models/wav2vec/wav2vec2.py", line 1049, in extract_features
x, (z, lr) = layer(
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/fairseq/models/wav2vec/wav2vec2.py", line 1260, in forward
x, attn = self.self_attn(
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/fairseq/modules/multihead_attention.py", line 538, in forward
return F.multi_head_attention_forward(
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/torch/nn/functional.py", line 5443, in multi_head_attention_forward
attn_output = linear(attn_output, out_proj_weight, out_proj_bias)
RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half
Traceback (most recent call last):
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/gradio/routes.py", line 437, in run_predict
output = await app.get_blocks().process_api(
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/gradio/blocks.py", line 1349, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/gradio/blocks.py", line 1283, in postprocess_data
prediction_value = block.postprocess(prediction_value)
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/gradio/components.py", line 2586, in postprocess
file_path = self.audio_to_temp_file(
File "/home/owner/rvc/.venv/lib/python3.10/site-packages/gradio/components.py", line 360, in audio_to_temp_file
temp_dir = Path(dir) / self.hash_bytes(data.tobytes())
AttributeError: 'NoneType' object has no attribute 'tobytes'
2024-02-17 20:36:17 | INFO | httpx | HTTP Request: POST http://localhost:7865/api/predict "HTTP/1.1 500 Internal Server Error"
2024-02-17 20:36:17 | INFO | httpx | HTTP Request: POST http://localhost:7865/reset "HTTP/1.1 200 OK"
Really hoping it's as simple as a package needing to be updated, or if i'm not the only one with this problem. This problem occurs in both Windows and Linux. Thank you in advance
I have the same problem. Have you solved it?
okay, I refresh the page, and then the problem disappeared...
Audio path or pitch are not provided
Audio path or pitch are not provided
The path is valid, and is the same method used before this issue happened. Only thing I can think of is the Intel binaries had to be updated because the GPU wouldn't be detected on the old versions.
Hoping there's a solution for this issue, since even on the dev branch it is throwing the same errors. I think the problem is stemming from the error RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half
hmm, did you install the libraries from the requirements?
hmm, did you install the libraries from the requirements?
If by that you mean the requirements, than yes, with intel_extension_for_pytorch, torch, torchaudio, and torchvision being updated to their respective latest versions from intel's XPU repository. Not doing such would result in the GPU not being detected
Closing due to skill issue on my part (involving packages) and Arch not updating their packages when they really should. Thank you for the help regardless
No clue how I got it working before (on linux), but the issue came up again. Originally, I was looking at the wrong thing but the real issue is the dtype coming up as float and half. Included is the log of what happens. Devs, I got this working on Windows via edited pip packages with DLLs to make it work.
` 2024-04-29 20:16:33 | INFO | configs.config | Found GPU Intel(R) Arc(TM) A750 Graphics 2024-04-29 20:16:33 | INFO | configs.config | Use xpu:0 instead 2024-04-29 20:16:33 | INFO | configs.config | Half-precision floating-point: True, device: xpu:0 2024-04-29 20:16:38 | INFO | httpx | HTTP Request: GET https://api.gradio.app/gradio-messaging/en "HTTP/1.1 200 OK" 2024-04-29 20:16:49 | INFO | infer.lib.rvcmd | checking hubret & rmvpe... 2024-04-29 20:16:53 | INFO | infer.lib.rvcmd | checking pretrained models... 2024-04-29 20:17:00 | INFO | infer.lib.rvcmd | checking pretrained models v2... 2024-04-29 20:17:08 | INFO | infer.lib.rvcmd | checking uvr5_weights... 2024-04-29 20:17:12 | INFO | infer.lib.rvcmd | all assets are already latest. 2024-04-29 20:17:12 | INFO | main | Use Language: en_US Running on local URL: http://0.0.0.0:7865 2024-04-29 20:17:22 | INFO | infer.modules.vc.modules | Get sid: Harley.pth 2024-04-29 20:17:22 | INFO | infer.modules.vc.modules | Loading: assets/weights/Harley.pth 2024-04-29 20:17:25 | INFO | infer.modules.vc.modules | Select index: logs\Harley\added_IVF3130_Flat_nprobe_1_Harley_v2.index 2024-04-29 20:17:48 | INFO | fairseq.tasks.hubert_pretraining | current directory is F:\rvc 2024-04-29 20:17:48 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False} 2024-04-29 20:17:48 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False} 2024-04-29 20:17:52 | INFO | infer.modules.vc.pipeline | Loading rmvpe model,assets/rmvpe/rmvpe.pt FFT WARNING: INPUT_STRIDES and OUTPUT_STRIDES are deprecated: please use FWD_STRIDES and BWD_STRIDES, instead.
FFT WARNING: INPUT_STRIDES and OUTPUT_STRIDES are deprecated: please use FWD_STRIDES and BWD_STRIDES, instead.
2024-04-29 20:18:18 | WARNING | infer.modules.vc.modules | Traceback (most recent call last): File "F:\rvc\infer\modules\vc\modules.py", line 188, in vc_single audio_opt = self.pipeline.pipeline( File "F:\rvc\infer\modules\vc\pipeline.py", line 428, in pipeline self.vc( File "F:\rvc\infer\modules\vc\pipeline.py", line 237, in vc logits = model.extract_features(**inputs) File "F:\rvc.venv\lib\site-packages\fairseq\models\hubert\hubert.py", line 535, in extract_features res = self.forward( File "F:\rvc.venv\lib\site-packages\fairseq\models\hubert\hubert.py", line 467, in forward x, _ = self.encoder( File "F:\rvc.venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "F:\rvc.venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "F:\rvc.venv\lib\site-packages\fairseq\models\wav2vec\wav2vec2.py", line 1003, in forward x, layer_results = self.extract_features(x, padding_mask, layer) File "F:\rvc.venv\lib\site-packages\fairseq\models\wav2vec\wav2vec2.py", line 1049, in extract_features x, (z, lr) = layer( File "F:\rvc.venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "F:\rvc.venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "F:\rvc.venv\lib\site-packages\fairseq\models\wav2vec\wav2vec2.py", line 1260, in forward x, attn = self.self_attn( File "F:\rvc.venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "F:\rvc.venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "F:\rvc.venv\lib\site-packages\fairseq\modules\multihead_attention.py", line 538, in forward return F.multi_head_attention_forward( File "F:\rvc.venv\lib\site-packages\torch\nn\functional.py", line 5443, in multi_head_attention_forward attn_output = linear(attn_output, out_proj_weight, out_proj_bias) RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half `
I am sorry for opening this up, and hopefully someone can get me pointed in the right direction. Thank you again
I'm having the exact same issue with a WSL build (incidentally if you have any advice on how to do the witchcraft to get it to run on Windows by itself, I'd love to hear it).
I am having the same issue on Linux Mint 22.04 and Intel Arc A770. Any help from the dev would be appreciated! @RVC-Boss
Also, GitHub bot adds stale and closing the issue adds nothing but frustration.
I am having the same issue on Linux Mint 22.04 and Intel Arc A770. Any help from the dev would be appreciated! @RVC-Boss
in your rvc directory open file infer/lib/infer_pack/models.py, go to line 382 (should be about the same, without the .to('cpu) at the end) and change the line to tmp_over_one.transpose(2, 1).to('cpu')
A few lines down (line 386 to be exact), edit the line so it is ).transpose(2, 1).to(tmp_over_one.device)
KEEP ALL INDENTATIONS INTACT
This is what worked for me, and I thank @a-One-Fan for this information.
Because I forgot to do so, the post I made solved the problem I had originally. I am closing this issue as a result, and hopefully it helps some others too