QuPengfei
Results
2
issues of
QuPengfei
throughput/latency calculation issue when bs > 1. increase in unexpected way. tm_list from the following should be the per token, not per batch. tm_list = np.array(perf_metrics.raw_metrics.m_durations) / 1000 / 1000...
llm_bench
i saw the issue with chatglm2-6b. it run successfully if with numactl -m 0 -C 0-23. it run failed if with numactl -m 0 -C 0-31, or 0-47 , or...