ByteTransformer icon indicating copy to clipboard operation
ByteTransformer copied to clipboard

How to make the performence breakdown like the picture Fig3?

Open chenhongyu2048 opened this issue 2 years ago • 2 comments

As I was reading this article I noticed that the TIME breakdown in Figure 3 is very accurate, I was wondering what tool you used to complete the time measurements?

chenhongyu2048 avatar Aug 16 '23 08:08 chenhongyu2048

I just measured time manually by conducting tic and toc for each segment and obtained T1, T2, ..., Tn. T_total = T1 + T2 + ... + Tn, such that the percentage of each segment can be computed --- T1/T_total * 100% or so. This measurement makes sense since the BERT inference is a single-stream computational pipeline. Alternatively you could try with the built-in nsight systems to measure elapsed time - and I would expect you see a similar result.

yzhaiustc avatar Aug 16 '23 16:08 yzhaiustc

Thank you for your reply. By "single-stream computational pipeline" do you mean that the time spent loading model weights from the HBM to the Cache will be counted in the computational time? Or the time for loading is overlapped by computation?

chenhongyu2048 avatar Aug 17 '23 03:08 chenhongyu2048