Zhao Changmin
Zhao Changmin
Thank you for your feedback, you may try ipex-llm[cpp] latest (version number >= 10.17) **tomorrow**.
Hi everyone, we have done another attempt and it is working on test cases, pls try again tomorrow with latest ipex-llm[cpp] 
hi, may you try with the latest ipex-llm(>=0325)?
> Yes, FP8 with batch 8 is fixed, but there is the same issue for INT4. 0,THUDM/chatglm3-6b,1223.31,**47.75**,0.0,1024-512,8,634-174,1,sym_int4,N/A,8.43,5.236328125,N/A Sorry that I can not reproduce this: ``` ,model,1st token avg latency (ms),2+...
hi @xduzhangjiayu , for ipex-llm >= 2.1.0b20240704 you may try: ```python model.save_low_bit(model_path) ``` to save low bit model, and ```python AutoModelForCausalLM.load_low_bit(model_path, trust_remote_code=True) ``` to load low bit model
> does the issue still exist? I encounter this issue too on `xpu_master` branch