MTL Linux Qwen-VL: LLVM ERROR: GenXCisaBuilder failed

Open lei-sun-intel opened this issue 1 year ago • 1 comments

Follow the guide to set up qwen-VL ipex-llm version:2.1.0b20240515, python version 3.9 https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/chat.py

download model from https://hf-mirror.com/Qwen/Qwen-VL-Chat-Int4/tree/main

(nb_dev) intel@intel-Meteor-Lake-Client-Platform:~/lei/ipex-llm/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl$ python chat.py 2024-05-16 00:15:59,668 - INFO - Note: NumExpr detected 22 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2024-05-16 00:15:59,668 - INFO - NumExpr defaulting to 8 threads. 2024-05-16 00:16:00,056 - WARNING - CUDA extension not installed. 2024-05-16 00:16:00,057 - WARNING - CUDA extension not installed. 2024-05-16 00:16:02,304 - INFO - intel_extension_for_pytorch auto imported Using disable_exllama is deprecated and will be removed in version 4.37. Use use_exllama instead and specify the version with exllama_config.The value of use_exllama will be overwritten by disable_exllama passed in GPTQConfig or stored in your config file. Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 6.02it/s] -------------------- Session 1 -------------------- Please input a picture: dog_cat.jpg Please enter the text: what is the picture? error: LLVM ERROR: GenXCisaBuilder failed for: < %.esimd133 = tail call <128 x float> @llvm.genx.dpas2.v128f32.v128f32.v128i32.v64i32(<128 x float> %.sroa.0196.027, <128 x i32> %.decomp.0, <64 x i32> %.esimd132, i32 10, i32 10, i32 8, i32 8, i32 1, i32 1)>: Intrinsic is not supported by <XeLPG> platform

LIBXSMM_VERSION: main_stable-1.17-3651 (25693763) LIBXSMM_TARGET: adl [Intel(R) Core(TM) Ultra 7 155H] Registry and code: 13 MB Command: python chat.py Uptime: 72.320985 s

May 21 '24 03:05 lei-sun-intel

We need to do some adaptation work for this gptj quantified qwen-vl model https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/chat.py. If your goal is to use the qwen-vl model on mtl linux, we recommend that you use save_low_bit to save Qwen-VL-Chat on other machines with sufficient memory. The int4 model in ipex-llm format, and then load it on mtl linux:

First check your linux driver and level zero version refer to" https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#linux

Then SAVE Qwen-VL:

# save low bit
    model = AutoModelForCausalLM.from_pretrained(model_path, 
                                                 load_in_4bit=True, 
                                                 trust_remote_code=True, 
                                                 modules_to_not_convert=['c_fc', 'out_proj'],
                                                 torch_dtype=torch.float32)
     model.save_low_bit(model_path+"-ipex-int4")

LOAD:

    model = AutoModelForCausalLM.load_low_bit(model_path+"-int4", 
                                                #  load_in_4bit=True, 
                                                 trust_remote_code=True, 
                                                 modules_to_not_convert=['c_fc', 'out_proj'],
                                                 torch_dtype=torch.float32)

output:

-------------------- Session 1 --------------------
 Please input a picture: test.jpg
 Please enter the text: what is it
---------- Response ----------
-------------------- Session 1 --------------------
 Please input a picture: pic.jpg
 Please enter the text: what is it?
---------- Response ----------
This is an anime scene. Against a blue sky, a black boy with wings on his back stands with his arms raised. Next to him is a white boy with a black cardigan and a tie.

May 28 '24 15:05 leonardozcm