intel-extension-for-transformers
intel-extension-for-transformers copied to clipboard
Fix to use token latency to measure performance
trafficstars
Type of Change
Bug Fix to use token latency instead of total inference time to measure performance
Description
The workshop notebook measure total inference time for performance instead of token latency. change it to use token latency.
Expected Behavior & Potential Risk
use token latency for performance measurement
How has this PR been tested?
manually test it on AWS
Dependency Change?
no