alexsin368
alexsin368
@andyluo7 When running with --token-latency, you also need to add the --ipex argument. For you, does it work when you run: python run.py --benchmark -m /model/chatglm2_6b/ --dtype bfloat16 --input-tokens 64...
@andyluo7 we have a PR merged that would give you a warning if you try to use the --token-latency without including the --ipex argument: https://github.com/intel-innersource/frameworks.ai.pytorch.ipex-cpu/pull/2639 If you have no other...
Issue has been fixed. Closing issue.