ipex-llm
ipex-llm copied to clipboard
transformers 4.38.1 gives bad llama3 performance on MTL iGPU
I'm running llama3 inference on a MTL Core Ultra 7 1003H iGPU on Ubuntu 2204. This link https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama3 is followed and generate.py is used. The complete script is:
source /opt/intel/oneapi/setvars.sh export SYCL_CACHE_PERSISTENT=1 export BIGDL_LLM_XMX_DISABLED=1
python ./generate.py --repo-id-or-model-path 'meta-llama/Meta-Llama-3-8B-Instruct' --prompt "some-1024-token-length-input" --n-predict 128
I got very different performance on two transformers versions. #transformers==4.37 Inference time: 11.824079990386963 s #transformers==4.38.1 Inference time: 16.150665760040283 s
Can I understand why transformers 4.38.1 is getting much worse performance?
My ipex-llm version is 2.1.0b20240521, oneAPI 2024.1.