ByteTransformer
ByteTransformer copied to clipboard
How to make the performence breakdown like the picture Fig3?
As I was reading this article I noticed that the TIME breakdown in Figure 3 is very accurate, I was wondering what tool you used to complete the time measurements?
I just measured time manually by conducting tic and toc for each segment and obtained T1, T2, ..., Tn.
T_total = T1 + T2 + ... + Tn, such that the percentage of each segment can be computed --- T1/T_total * 100% or so.
This measurement makes sense since the BERT inference is a single-stream computational pipeline.
Alternatively you could try with the built-in nsight systems to measure elapsed time - and I would expect you see a similar result.
Thank you for your reply. By "single-stream computational pipeline" do you mean that the time spent loading model weights from the HBM to the Cache will be counted in the computational time? Or the time for loading is overlapped by computation?