ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

rwkv5 fail to run on iGPU

Open juan-OY opened this issue 1 year ago • 4 comments

RWKV 5 model can not run on gen 12 iGPU

(RWKV-py310) a770@RPLP-A770:~/ouyang/rwkv/models$ python generate_rwkv5.py --repo-id-or-model-path /home/a770/ouyang/rwkv/models/rwkv-5-world/ /home/a770/.local/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( 2024-02-21 22:02:52,874 - INFO - intel_extension_for_pytorch auto imported ******* loading model: /home/a770/ouyang/rwkv/models/rwkv-5-world/ 2024-02-21 22:04:32,151 - INFO - Converting the current model to sym_int4 format...... <class 'transformers_modules.modeling_rwkv5.Rwkv5ForCausalLM'> Can not read the prompt file, please check the file path. error: LLVM ERROR: VISA builder API call failed: CisaBuilder->Compile("genxir", &ss, BC->emitVisaOnly())

LIBXSMM_VERSION: main_stable-1.17-3651 (25693763) LIBXSMM_TARGET: adl [12th Gen Intel(R) Core(TM) i7-12700] Registry and code: 13 MB Command: python generate_rwkv5.py --repo-id-or-model-path /home/a770/ouyang/rwkv/models/rwkv-5-world/ Uptime: 230.127147 s Aborted

juan-OY avatar Feb 21 '24 14:02 juan-OY

Can we try this? https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/KeyFeatures/multi_gpus_selection.html#oneapi-device-selector

jason-dai avatar Feb 22 '24 02:02 jason-dai

After exporting: export ONEAPI_DEVICE_SELECTOR=level_zero:1, it still fails

(RWKV-py310) a770@RPLP-A770:~/ouyang/rwkv/models$ sycl-ls Warning: ONEAPI_DEVICE_SELECTOR environment variable is set to level_zero:1. To see the correct device id, please unset ONEAPI_DEVICE_SELECTOR.

[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) UHD Graphics 770 1.3 [1.3.26241]

(RWKV-py310) a770@RPLP-A770:~/ouyang/rwkv/models$ python generate_rwkv5.py --repo-id-or-model-path /home/a770/ouyang/rwkv/models/rwkv-5-world/ /home/a770/.local/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( 2024-02-22 10:44:16,572 - INFO - intel_extension_for_pytorch auto imported ******* loading model: /home/a770/ouyang/rwkv/models/rwkv-5-world/ 2024-02-22 10:49:01,157 - INFO - Converting the current model to sym_int4 format...... <class 'transformers_modules.modeling_rwkv5.Rwkv5ForCausalLM'> Can not read the prompt file, please check the file path. onednn_verbose,info,oneDNN v3.3.0 (commit 887fb044ccd6308ed1780a3863c2c6f5772c94b3) onednn_verbose,info,cpu,runtime:threadpool,nthr:10 onednn_verbose,info,cpu,isa:Intel AVX2 with Intel DL Boost onednn_verbose,info,gpu,runtime:DPC++ onednn_verbose,info,gpu,engine,0,backend:Level Zero,name:Intel(R) UHD Graphics 770,driver_version:1.3.26241,binary_kernels:enabled onednn_verbose,info,graph,backend,0:dnnl_backend onednn_verbose,info,experimental features are enabled onednn_verbose,info,use batch_normalization stats one pass is enabled onednn_verbose,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time onednn_verbose,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,backend,exec_time onednn_verbose,common,error,level_zero,errcode 1879048196 Traceback (most recent call last): File "/home/a770/ouyang/rwkv/models/generate_rwkv5.py", line 96, in output = model.generate(input_ids, File "/home/a770/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/a770/ouyang/rwkv/models/benchmark_util.py", line 1613, in generate return self.sample( File "/home/a770/ouyang/rwkv/models/benchmark_util.py", line 2697, in sample outputs = self( File "/home/a770/ouyang/rwkv/models/benchmark_util.py", line 533, in call return self.model(*args, **kwargs) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/a770/.cache/huggingface/modules/transformers_modules/modeling_rwkv5.py", line 820, in forward rwkv_outputs = self.rwkv( File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/a770/miniconda3/envs/RWKV-py310/lib/python3.10/site-packages/bigdl/llm/transformers/models/rwkv5.py", line 305, in rwkv_model_forward return origin_rwkv_model_forward( File "/home/a770/.cache/huggingface/modules/transformers_modules/modeling_rwkv5.py", line 708, in forward hidden_states, state, attentions = block( File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/a770/.cache/huggingface/modules/transformers_modules/modeling_rwkv5.py", line 417, in forward attention, state = self.attention(self.ln1(hidden), state=state, use_cache=use_cache, seq_mode=seq_mode) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/a770/miniconda3/envs/RWKV-py310/lib/python3.10/site-packages/bigdl/llm/transformers/models/rwkv5.py", line 178, in rwkv_attention_forward receptance, key, value, gate, state = extract_key_value(self, hidden, state) File "/home/a770/miniconda3/envs/RWKV-py310/lib/python3.10/site-packages/bigdl/llm/transformers/models/rwkv5.py", line 64, in extract_key_value key = self.key(key) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/a770/miniconda3/envs/RWKV-py310/lib/python3.10/site-packages/bigdl/llm/transformers/low_bit_linear.py", line 521, in forward result = linear_q4_0.forward_new(x_2d, self.weight.data, self.weight.qtype, RuntimeError: could not create a primitive

juan-OY avatar Feb 22 '24 02:02 juan-OY

I believe this is an issue of IPEX, under this case which exists both dgpu and igpu, Ipex will always select the first device in the list as the executed platform, even if the tensors are in anther device. https://github.com/intel/intel-extension-for-pytorch/issues/536

What about try this on a single igpu machine?

leonardozcm avatar Feb 23 '24 01:02 leonardozcm

It worked on single iGPU

juan-OY avatar Feb 26 '24 14:02 juan-OY