openvino.genai
openvino.genai copied to clipboard
fix the caculation of performance metric
throughput/latency calculation issue when bs > 1. increase in unexpected way.
tm_list from the following should be the per token, not per batch. tm_list = np.array(perf_metrics.raw_metrics.m_durations) / 1000 / 1000
thanks, Pengfei