Zhao Changmin

Results 46 comments of Zhao Changmin

Thank you for your feedback, you may try ipex-llm[cpp] latest (version number >= 10.17) **tomorrow**.

Hi everyone, we have done another attempt and it is working on test cases, pls try again tomorrow with latest ipex-llm[cpp] ![image](https://github.com/user-attachments/assets/63f48847-a175-439f-b003-9a82bca57f40)

> Yes, FP8 with batch 8 is fixed, but there is the same issue for INT4. 0,THUDM/chatglm3-6b,1223.31,**47.75**,0.0,1024-512,8,634-174,1,sym_int4,N/A,8.43,5.236328125,N/A Sorry that I can not reproduce this: ``` ,model,1st token avg latency (ms),2+...

hi @xduzhangjiayu , for ipex-llm >= 2.1.0b20240704 you may try: ```python model.save_low_bit(model_path) ``` to save low bit model, and ```python AutoModelForCausalLM.load_low_bit(model_path, trust_remote_code=True) ``` to load low bit model

> does the issue still exist? I encounter this issue too on `xpu_master` branch