vidur icon indicating copy to clipboard operation
vidur copied to clipboard

Regarding the calculation of the duration in the profiling phase

Open doudouxx opened this issue 8 months ago • 0 comments

Hello Vidur,

Thank you for sharing your work. While reading the code, I encountered a question.

I am analyzing the profiling part of the code. The profiling is divided into two parts: MLP and attention. When calculating the duration for MLP, it first finds all the children events, then finds the correlation for each child, and sums the duration of all correlated events to get the total duration for the event. (The function get_operation_time_stats() in Vidur) However, for attention, it uses the sarathi module, where the implementation sums up all the CUDA runtime durations using sum([e.cuda_time_total for e in trace.key_averages()]). This approach sums all CUDA runtime durations. (The function handle_trace() in Sarathi) However, for MLP, it does not sum all CUDA runtime durations. Is my understanding correct? These two methods seem inconsistent. How were these approaches considered?

doudouxx avatar Apr 12 '25 02:04 doudouxx