alexsin368

Results 23 comments of alexsin368

@chsasank I installed IPEX 2.1.40+xpu with Python 3.11.9, same as you, but am only able to reproduce 1 out of the 2 issues you see. I'm getting 16.72 tflops and...

@chsasank The performance regression needs to be within the scope of IPEX itself for my team and I to continue debugging. Let's figure out whether the regression is indeed to...

Hi @yash3056 please describe your issue in detail and provide the code and steps to reproduce it.

@LeptonWu this issue could be related to https://github.com/intel/intel-extension-for-pytorch/issues/529 and my team members are looking into it.

@Pradeepa99 The release notes mention more support for AWQ format support and it seems it is referring to the usage of ipex.llm.optimize where you can specify the quant_method as 'gptq'...

@Pradeepa99 yes, the testcase example you found is what I meant. IPEX does not have an example similar to the GPTQ one you found. We recommend you to use Intel...

@andyluo7 I will work on reproducing this issue and get back to you with findings. Did you try passing in THUDM/chatglm2-6b directly as the model?

Issue reproduced. What version of transformers are you using? I have 4.37.0. I will be working with the team to resolve your issue.

@andyluo7 I found out what's causing the issue. It's when you pass in --token-latency as an input argument. Take a look at lines 211 and 215: https://github.com/intel/intel-extension-for-pytorch/blob/main/examples/cpu/inference/python/llm/single_instance/run_generation.py#L211-L215 For now, try...

@andyluo7 As a workaround for now, modify line 211 in run_generation.py script to _gen_ids = output_ for now and it should work.