model_analyzer icon indicating copy to clipboard operation
model_analyzer copied to clipboard

Model Analyzer GPU Memory Usage Differences

Open KimiJL opened this issue 11 months ago • 5 comments

Version: nvcr.io/nvidia/tritonserver:24.01-py3-sdk

For a profiled model, the GPU Memory Usage (MB) shown in results/metrics-model-gpu.csv is different from model result_summary.pdf.

In my case, metrics-model-gpu.csv shows 1592.8 while the pdf report shows 1031.

Could be my misunderstanding, do these two metrics represent the same thing? I am looking for the maximum GPU usage for a given model, so which would be the more accurate result?

KimiJL avatar Mar 25 '24 20:03 KimiJL

Additional Context:

I am using an instance with two GPUs, though the model is limited to a single instance.

I have noticed that if I added up the GPU memory of both GPUs from csv, then divide by 2, I (470.8 + 1592.8) / 2 = 1031.8, i'm getting near the pdf result. Could be a coincidence?

KimiJL avatar Mar 25 '24 21:03 KimiJL

Hi @KimiJL, sorry for the slow response. I just returned from vacation.

I suspect that your observation is not a coincidence and that there is a bug. We will have to investigate further.

May I ask, were you running in local mode? Or docker or remote?

tgerdesnv avatar Apr 18 '24 16:04 tgerdesnv

Hi @tgerdesnv thanks for the response,

I was running in in --triton-launch-mode=docker

KimiJL avatar Apr 18 '24 17:04 KimiJL

@KimiJL I have confirmed that the values in the pdfs are in fact the averages across the GPUs. The values in metrics-model-gpu.csv are the raw values per-gpu. So, in your case, the total maximum memory usage by the model on your machine would be 470.8 + 1592.8

I will fix Model Analyzer to show total memory usage, or clarify the labels to indicate that it is average memory usage.

tgerdesnv avatar Apr 24 '24 17:04 tgerdesnv

@tgerdesnv great, thank you for the clarification, that makes sense!

KimiJL avatar Apr 24 '24 18:04 KimiJL