intel-extension-for-transformers Fix to use token latency to measure performance

Fix to use token latency to measure performance

Open louie-tsai opened this issue 1 year ago • 0 comments

trafficstars

Bug Fix to use token latency instead of total inference time to measure performance

The workshop notebook measure total inference time for performance instead of token latency. change it to use token latency.

use token latency for performance measurement

manually test it on AWS

Apr 27 '24 21:04 louie-tsai