Zhao Changmin comments

Results 46 comments of


                                            Zhao Changmin

Ollama error after a few requests - ubatch must be set as the times of VS

Thank you for your feedback, you may try ipex-llm[cpp] latest (version number >= 10.17) **tomorrow**.

Ollama error after a few requests - ubatch must be set as the times of VS

Hi everyone, we have done another attempt and it is working on test cases, pls try again tomorrow with latest ipex-llm[cpp] ![image](https://github.com/user-attachments/assets/63f48847-a175-439f-b003-9a82bca57f40)

all-in-one benchmark tool benchmark issue, the latency of next token with large batch is too high

hi, may you try with the latest ipex-llm(>=0325)?

all-in-one benchmark tool benchmark issue, the latency of next token with large batch is too high

> Yes, FP8 with batch 8 is fixed, but there is the same issue for INT4. 0,THUDM/chatglm3-6b,1223.31,**47.75**,0.0,1024-512,8,634-174,1,sym_int4,N/A,8.43,5.236328125,N/A Sorry that I can not reproduce this: ``` ,model,1st token avg latency (ms),2+...

NPU inference error

hi @xduzhangjiayu , for ipex-llm >= 2.1.0b20240704 you may try: ```python model.save_low_bit(model_path) ``` to save low bit model, and ```python AutoModelForCausalLM.load_low_bit(model_path, trust_remote_code=True) ``` to load low bit model

The compile_bundle.sh script fails at ipex build step

> does the issue still exist? I encounter this issue too on `xpu_master` branch