Leo Zhao

Results 16 comments of Leo Zhao

@luotao1 For ParallelExecutor, how to calculate benchmark result by your QA team? is it "speed * CPU_NUM" or just speed?

> @luotao1 For ParallelExecutor, how to calculate benchmark result by your QA team? is it "speed * CPU_NUM" or just speed? @luotao1 Any feedback on this question?

then how to measure if speed is comparable with V100 ? e.g. V100: BS=1 speed 3.4steps/s, Xeon: BS=1 8 CPU_NUM: speed 0.43 steps/s Are they identical?

Yes, speed is not linear with CPU_NUM, but I checked code, and find this speed reflects iteration execution time, not really processed samples. It means: for each iteration, the processed...

I see this calculation logic in benchmark run.sh by use samples/s, it counts both CPU_NUM, BS. I think it makes more sense.

seems libcustom_tpc_perf_lib.so is not packaged into whl, we need to add libcustom_tpc_perf_lib.so into whl as well.

This issue can only be reproduced on PRC sku which FP32 GEMM is disabled.