HugeCTR icon indicating copy to clipboard operation
HugeCTR copied to clipboard

[Requirement]Profiling operations for HugeCTR

Open jayz0123 opened this issue 2 years ago • 3 comments

Hi HugeCTR team,

Recently I have used Nsight to profile a model which uses HugeCTR.

Unlike another tool, DLProf, which gives an operation break down for the model, I found the result from Nsight is very low level and it is quite difficult to find out what's the total time of each operation is.

I am wondering is there a way to get a high level operation profile for HugeCTR model?

jayz0123 avatar Aug 12 '22 08:08 jayz0123

Hi @regnnighe , so far, DLProf doesn't support HugeCTR and there's no specific high level profiling tool for HugeCTR. But I think it's a good requirement to support better profiling mechanisms. Would you like to elaborate what are the high level operation profile you are looking for? Do you think DLProf is enough to support your usage?

zehuanw avatar Aug 15 '22 01:08 zehuanw

@zehuanw Thank you for your reply! For example, while I am using HugeCTR for the DRLM model, it would be nice to have a profiler which can give the operation time for different parts of the model, such as embeddings, bottom mlp, top mlp, and etc.

For DLProf, it works well for a model using Pytorch since it shows the operation label so that I know how much time are corresponding to each part of the model.

jayz0123 avatar Aug 15 '22 03:08 jayz0123

Thank you for the feedback! Relabel it as a functional feature requirement. We will track the planning and development in this issue.

zehuanw avatar Aug 16 '22 09:08 zehuanw