ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

latency and OOM issue when testing Qwen-14B-Chat-INT4 with NUC (12G i7 + a770m 16G) on ubuntu 22.04.3

Open darshanhuang1 opened this issue 1 year ago • 6 comments

We test Qwen-14B-Chat-INT4 with NUC (12G i7 + a770m 16G) on ubuntu 22.04.3. Comparing with NV's 3090 GPU, we find the issues below.

  1. A conversation takes too much time, over 3 seconds, and we think it takes 1-2 seconds to meet the actual requirements.
  2. When the input token reaches around 1500, the GPU memory OOM(out of memory), but actually we expect 8K input token.
    So, Would you pls help
  3. Any optimization to improve the latency?
  4. Any recommendation to reduce GPU memory usage and what's the maximum input token for this model and HW? image

darshanhuang1 avatar Dec 13 '23 11:12 darshanhuang1

  1. The performance of Qwen-14B-Chat on our machines is good. Here is our configuration: Machine: i9 14900K; arc A770; 64 GB mem DDR5 (Linux) bigdl's version: 2.5.0b20231213 Kernel version: 5.19.0-41-generic Test code: https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen I suspect the root cause is the kernel version, what's your kernel version.
  2. Currently, qwen-14b-chat will oom if input length is over 1024.

Ricky-Ting avatar Dec 15 '23 01:12 Ricky-Ting

Here is how to downgrade linux kernel. https://github.com/intel-analytics/bigdl-llm-tutorial/blob/main/ch_6_GPU_Acceleration/environment_setup.md

Ricky-Ting avatar Dec 15 '23 02:12 Ricky-Ting

our kernel version is default 6.2 in ubuntu22.04.3, we will downgrade it and re-test next week. BTW, the complete prompt is as below. 假设你是文档问答助手,可以根据上下文内容回复用户的问题,若是找到相关或相似答案,用简洁的内容回复即可,若是找不到答案,回复“对不起,从上下文内容中无法找到答案”。

李星云
李星云身世神秘,流落江湖,时年九岁。渝州城一战,李焕身死,拜在阳叔子门下学艺,八年后(乾化二年,912年)出山,携师妹陆林轩行走江湖。因身负龙泉剑,并由于其身世败露,于是引来各方势力追杀。后结识幻音坊侍女姬如雪和通文馆少主张子凡,四人化敌为友共闯江湖。 [3]
性别:男
年龄:17
生肖:兔
身高:1米78
体重:142斤
身份:大唐皇子
武器:长剑(被蒋昭义的炎龙掌打飞后丢失)、龙泉剑
暗器:华阳针
阵法:缚灵阵
武功:青莲剑歌、华阳针法、龙泉七星诀
内力:小天位→中天位(习得龙泉七星诀)→大天位(习得内经)

姬如雪
幻音坊女帝的侍女,奉命劫夺李星云所携的龙泉剑,后又奉命劫持李星云。事败后反被李星云所救,遂钟情李星云。 [3]
性格:不苟言笑,外表:略显高冷。
奉命寻找火灵芝,却遭到玄冥教教众围攻,而后为李星云所救,隐约间心生爱意。后又奉命夺取龙泉剑,被不良人劫到藏兵谷,再次被李星云所救,心生爱慕。玄净天与妙成天奉命前来服侍李星云时,显得有些吃醋,也恰恰表明了她对李星云的心意。
为了保护李星云被乱箭射死,后复活。
阵营:岐国、幻音坊
性别:女
年龄:18
生肖:虎
身高:1米7
体重:95斤
身份:幻音坊侍女、李星云的妻子
武器:素心剑(出自游云惊龙设定集)
武功:幻音诀
内力:中星位→大星位(服下火灵芝)

问题:李星云会哪些武功?

darshanhuang1 avatar Dec 15 '23 02:12 darshanhuang1

Here is the updated version of how to setup bigdl-llm's environment: https://github.com/intel-analytics/bigdl-llm-tutorial/blob/main/ch_6_GPU_Acceleration/environment_setup.md

qiuxin2012 avatar Dec 15 '23 08:12 qiuxin2012

chat.txt Our code to test qwen with context. For 14B model, seems now we can only run one round of chat.

https://github.com/xusenlinzy/api-for-open-llm/tree/master customer code

hkvision avatar Jan 16 '24 10:01 hkvision

Update: By emptying cache after each chat round, can run all the questions.

hkvision avatar Jan 18 '24 08:01 hkvision

GPU memory limitation, sw is fine.

darshanhuang1 avatar Apr 12 '24 06:04 darshanhuang1