ipex-llm llama3-8B causes MTL iGPU runtime error when ipex-llm's running AI inference

llama3-8B causes MTL iGPU runtime error when ipex-llm's running AI inference

Open zcwang opened this issue 2 months ago • 3 comments

Hello ipex-llm experts, I suffers issue about Llama-3-8B on MTL-H's iGPU and need any advice from you. :)

It seems to have issue with iGPU in MTL 155H but no issue with ARC770 in Ubuntu 22.04+kernel v6.8.2.

ARC770 works well

(llm-test) intel@mydevice:~/work/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama3$ ONEAPI_DEVICE_SELECTOR=level_zero:0 python ./generate.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --prompt 'History of Intel' --n-predict 64
2024-05-13 14:56:26,831 - INFO - intel_extension_for_pytorch auto imported
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 11.45it/s]
2024-05-13 14:56:27,298 - INFO - Converting the current model to sym_int4 format......
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Inference time: 1.4299554824829102 s
-------------------- Prompt --------------------
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

History of Intel<|eot_id|><|start_header_id|>assistant<|end_header_id|>


-------------------- Output (skip_special_tokens=False) --------------------
<|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|>

History of Intel<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The fascinating history of Intel!

Intel Corporation, one of the world's leading semiconductor companies, has a rich history that spans over six decades. Here's a brief overview:

**Early Years (1957-1969)**

Intel was founded on July 18, 1957, by Gordon Moore and Robert Noy

LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)
LIBXSMM_TARGET: adl [Intel(R) Core(TM) Ultra 7 155H]
Registry and code: 13 MB
Command: python ./generate.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --prompt History of Intel --n-predict 64
Uptime: 12.174066 s

Issue with error "RuntimeError: probability tensor contains either inf, nan or element < 0"

(llm-test) intel@mydevice:~/work/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama3$ ONEAPI_DEVICE_SELECTOR=level_zero:1 python ./generate.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --prompt 'History of Intel' --n-predict 64
2024-05-13 15:00:17,639 - INFO - intel_extension_for_pytorch auto imported
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 11.61it/s]
2024-05-13 15:00:18,130 - INFO - Converting the current model to sym_int4 format......
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Traceback (most recent call last):
  File "/home/intel/work/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama3/./generate.py", line 81, in <module>
    output = model.generate(input_ids,
  File "/home/intel/anaconda3/envs/rag-demo/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/intel/anaconda3/envs/rag-demo/lib/python3.9/site-packages/ipex_llm/transformers/lookup.py", line 87, in generate
    return original_generate(self,
  File "/home/intel/anaconda3/envs/rag-demo/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/intel/anaconda3/envs/rag-demo/lib/python3.9/site-packages/ipex_llm/transformers/speculative.py", line 109, in generate
    return original_generate(self,
  File "/home/intel/anaconda3/envs/rag-demo/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/intel/anaconda3/envs/rag-demo/lib/python3.9/site-packages/transformers/generation/utils.py", line 1520, in generate
    return self.sample(
  File "/home/intel/anaconda3/envs/rag-demo/lib/python3.9/site-packages/transformers/generation/utils.py", line 2653, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)
LIBXSMM_TARGET: adl [Intel(R) Core(TM) Ultra 7 155H]
Registry and code: 13 MB
Command: python ./generate.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --prompt History of Intel --n-predict 64
Uptime: 11.134912 s

Environment info

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2024.17.3.0.08_160000]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 7 155H OpenCL 3.0 (Build 0) [2024.17.3.0.08_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) Graphics OpenCL 3.0 NEO  [24.13.29138.7]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO  [24.13.29138.7]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.29138]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Arc(TM) Graphics 1.3 [1.3.29138]

intel_extension_for_pytorch    2.1.20+git0e2bee2
torch                          2.1.0.post0+cxx11.abi
torchvision                    0.16.0+fbb4cc5
sentence-transformers          2.3.1
transformers                   4.37.0
transformers-stream-generator  0.0.5

May 13 '24 07:05 zcwang

ipex-llm ipex-llm copied to clipboard

llama3-8B causes MTL iGPU runtime error when ipex-llm's running AI inference

ipex-llm
ipex-llm copied to clipboard