ipex-llm
ipex-llm copied to clipboard
latency and OOM issue when testing Qwen-14B-Chat-INT4 with NUC (12G i7 + a770m 16G) on ubuntu 22.04.3
We test Qwen-14B-Chat-INT4 with NUC (12G i7 + a770m 16G) on ubuntu 22.04.3. Comparing with NV's 3090 GPU, we find the issues below.
- A conversation takes too much time, over 3 seconds, and we think it takes 1-2 seconds to meet the actual requirements.
- When the input token reaches around 1500, the GPU memory OOM(out of memory), but actually we expect 8K input token.
So, Would you pls help - Any optimization to improve the latency?
- Any recommendation to reduce GPU memory usage and what's the maximum input token for this model and HW?
- The performance of Qwen-14B-Chat on our machines is good. Here is our configuration: Machine: i9 14900K; arc A770; 64 GB mem DDR5 (Linux) bigdl's version: 2.5.0b20231213 Kernel version: 5.19.0-41-generic Test code: https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen I suspect the root cause is the kernel version, what's your kernel version.
- Currently, qwen-14b-chat will oom if input length is over 1024.
Here is how to downgrade linux kernel. https://github.com/intel-analytics/bigdl-llm-tutorial/blob/main/ch_6_GPU_Acceleration/environment_setup.md
our kernel version is default 6.2 in ubuntu22.04.3, we will downgrade it and re-test next week. BTW, the complete prompt is as below. 假设你是文档问答助手,可以根据上下文内容回复用户的问题,若是找到相关或相似答案,用简洁的内容回复即可,若是找不到答案,回复“对不起,从上下文内容中无法找到答案”。
李星云
李星云身世神秘,流落江湖,时年九岁。渝州城一战,李焕身死,拜在阳叔子门下学艺,八年后(乾化二年,912年)出山,携师妹陆林轩行走江湖。因身负龙泉剑,并由于其身世败露,于是引来各方势力追杀。后结识幻音坊侍女姬如雪和通文馆少主张子凡,四人化敌为友共闯江湖。 [3]
性别:男
年龄:17
生肖:兔
身高:1米78
体重:142斤
身份:大唐皇子
武器:长剑(被蒋昭义的炎龙掌打飞后丢失)、龙泉剑
暗器:华阳针
阵法:缚灵阵
武功:青莲剑歌、华阳针法、龙泉七星诀
内力:小天位→中天位(习得龙泉七星诀)→大天位(习得内经)
姬如雪
幻音坊女帝的侍女,奉命劫夺李星云所携的龙泉剑,后又奉命劫持李星云。事败后反被李星云所救,遂钟情李星云。 [3]
性格:不苟言笑,外表:略显高冷。
奉命寻找火灵芝,却遭到玄冥教教众围攻,而后为李星云所救,隐约间心生爱意。后又奉命夺取龙泉剑,被不良人劫到藏兵谷,再次被李星云所救,心生爱慕。玄净天与妙成天奉命前来服侍李星云时,显得有些吃醋,也恰恰表明了她对李星云的心意。
为了保护李星云被乱箭射死,后复活。
阵营:岐国、幻音坊
性别:女
年龄:18
生肖:虎
身高:1米7
体重:95斤
身份:幻音坊侍女、李星云的妻子
武器:素心剑(出自游云惊龙设定集)
武功:幻音诀
内力:中星位→大星位(服下火灵芝)
问题:李星云会哪些武功?
Here is the updated version of how to setup bigdl-llm's environment: https://github.com/intel-analytics/bigdl-llm-tutorial/blob/main/ch_6_GPU_Acceleration/environment_setup.md
chat.txt Our code to test qwen with context. For 14B model, seems now we can only run one round of chat.
https://github.com/xusenlinzy/api-for-open-llm/tree/master customer code
Update: By emptying cache after each chat round, can run all the questions.
GPU memory limitation, sw is fine.