Kai Huang comments

Results 136 comments of


                                            Kai Huang

Arc A series configuration issue about export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1

Fixed in: https://github.com/intel-analytics/ipex-llm/pull/10566

qwen1.8B GPU memory usage is too high

Hi, https://www.modelscope.cn/models/qwen/Qwen-1_8B-Chat/summary in this page seems the memory is for 1 token in and 2048/8192 token out. We will reproduce this result and update our results here.

qwen1.8B GPU memory usage is too high

For longer sequence input (1k/2k/4k,...), bigdl-llm uses larger memory than the official model. We will look into this.

qwen1.8B GPU memory usage is too high

Hi, sorry for the late reply. One difference is that for the official int4 model, it is using w4a16, but previously when you run with ipex-llm, we are using w4a32,...

Support for MultiGPU Mixtral Inference

Hi @sriraman2020 Sorry that Mixtral multi-gpu inference is currently not supported. We will update in this issue if this is supported in the future.

Qwen issues with >2K input token size on MTL

Hi @AmberXu98 Do you encounter this error `Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES)` meaning out of memory? Also could you provide more information of your settings so that we can better...

Qwen issues with >2K input token size on MTL

Hi @AmberXu98 We are reproducing with your prompt again to double confirm it in our environment. By the way, want to confirm is it the case that if you input...

Qwen issues with >2K input token size on MTL

> > Hi @AmberXu98 > > We are reproducing with your prompt again to double confirm it in our environment. > > By the way, want to confirm is it...

inference with windows issue

@leonardozcm Please take a look.

latency and OOM issue when testing Qwen-14B-Chat-INT4 with NUC (12G i7 + a770m 16G) on ubuntu 22.04.3

[chat.txt](https://github.com/intel-analytics/BigDL/files/13948810/chat.txt) Our code to test qwen with context. For 14B model, seems now we can only run one round of chat. https://github.com/xusenlinzy/api-for-open-llm/tree/master customer code