MTL Linux Qwen-VL: LLVM ERROR: GenXCisaBuilder failed
Follow the guide to set up qwen-VL ipex-llm version:2.1.0b20240515, python version 3.9 https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/chat.py
download model from https://hf-mirror.com/Qwen/Qwen-VL-Chat-Int4/tree/main
(nb_dev) intel@intel-Meteor-Lake-Client-Platform:~/lei/ipex-llm/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl$ python chat.py
2024-05-16 00:15:59,668 - INFO - Note: NumExpr detected 22 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-05-16 00:15:59,668 - INFO - NumExpr defaulting to 8 threads.
2024-05-16 00:16:00,056 - WARNING - CUDA extension not installed.
2024-05-16 00:16:00,057 - WARNING - CUDA extension not installed.
2024-05-16 00:16:02,304 - INFO - intel_extension_for_pytorch auto imported
Using disable_exllama is deprecated and will be removed in version 4.37. Use use_exllama instead and specify the version with exllama_config.The value of use_exllama will be overwritten by disable_exllama passed in GPTQConfig or stored in your config file.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 6.02it/s]
-------------------- Session 1 --------------------
Please input a picture: dog_cat.jpg
Please enter the text: what is the picture?
error: LLVM ERROR: GenXCisaBuilder failed for: < %.esimd133 = tail call <128 x float> @llvm.genx.dpas2.v128f32.v128f32.v128i32.v64i32(<128 x float> %.sroa.0196.027, <128 x i32> %.decomp.0, <64 x i32> %.esimd132, i32 10, i32 10, i32 8, i32 8, i32 1, i32 1)>: Intrinsic is not supported by <XeLPG> platform
LIBXSMM_VERSION: main_stable-1.17-3651 (25693763) LIBXSMM_TARGET: adl [Intel(R) Core(TM) Ultra 7 155H] Registry and code: 13 MB Command: python chat.py Uptime: 72.320985 s
We need to do some adaptation work for this gptj quantified qwen-vl model https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/chat.py. If your goal is to use the qwen-vl model on mtl linux, we recommend that you use save_low_bit to save Qwen-VL-Chat on other machines with sufficient memory. The int4 model in ipex-llm format, and then load it on mtl linux:
First check your linux driver and level zero version refer to" https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#linux
Then SAVE Qwen-VL:
# save low bit
model = AutoModelForCausalLM.from_pretrained(model_path,
load_in_4bit=True,
trust_remote_code=True,
modules_to_not_convert=['c_fc', 'out_proj'],
torch_dtype=torch.float32)
model.save_low_bit(model_path+"-ipex-int4")
LOAD:
model = AutoModelForCausalLM.load_low_bit(model_path+"-int4",
# load_in_4bit=True,
trust_remote_code=True,
modules_to_not_convert=['c_fc', 'out_proj'],
torch_dtype=torch.float32)
output:
-------------------- Session 1 --------------------
Please input a picture: test.jpg
Please enter the text: what is it
---------- Response ----------
-------------------- Session 1 --------------------
Please input a picture: pic.jpg
Please enter the text: what is it?
---------- Response ----------
This is an anime scene. Against a blue sky, a black boy with wings on his back stands with his arms raised. Next to him is a white boy with a black cardigan and a tie.