ipex-llm ENV setting issue for Arc A770 about inference chatglm3-6b

this configuration as below cannot bring performance promotion, but more slower. bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 setting this ENV variable, the performance is as below: 2024-02-08 13:46:16,388 - INFO - Converting the current model to sym_int4 format...... <class 'transformers_modules.modeling_chatglm.ChatGLMForConditionalGeneration'> =========First token cost 4.7585 s and 5.5625 GB========= =========Rest tokens cost average 0.0341 s (159 tokens in all) and 5.5625 GB========= =========First token cost 0.3584 s and 5.5625 GB========= =========Rest tokens cost average 0.0326 s (159 tokens in all) and 5.5625 GB========= Inference time: 5.5508105754852295 s Remove this ENV variable, the performance is as below: 2024-02-08 13:47:39,833 - INFO - Converting the current model to sym_int4 format...... <class 'transformers_modules.modeling_chatglm.ChatGLMForConditionalGeneration'> =========First token cost 4.1724 s and 5.5625 GB========= =========Rest tokens cost average 0.0205 s (159 tokens in all) and 5.5625 GB========= =========First token cost 0.3476 s and 5.5625 GB========= =========Rest tokens cost average 0.0202 s (159 tokens in all) and 5.5625 GB========= Inference time: 3.5677762031555176 s

Feb 08 '24 05:02 Fred-cell

Hi @Fred-cell

Seems that I can't reproduce your issue on our machine (arc09:/home/arda/kai/BigDL/python/llm/dev/benchmark/all-in-one), kernel 6.2 and ipex 2.1 and bigdl-llm 2.5.0b20240207).

With these two environment variables, the performance on our machine improves by 1~2ms for rest token. We will go to your machine to verify this issue after the holiday.

Feb 08 '24 11:02 hkvision

Fixed in https://github.com/intel-analytics/ipex-llm/pull/10566

Apr 02 '24 11:04 hkvision