model_analyzer
model_analyzer copied to clipboard
Model Analyzer GPU Memory Usage Differences
Version: nvcr.io/nvidia/tritonserver:24.01-py3-sdk
For a profiled model, the GPU Memory Usage (MB) shown in results/metrics-model-gpu.csv is different from model result_summary.pdf.
In my case, metrics-model-gpu.csv shows 1592.8 while the pdf report shows 1031.
Could be my misunderstanding, do these two metrics represent the same thing? I am looking for the maximum GPU usage for a given model, so which would be the more accurate result?
Additional Context:
I am using an instance with two GPUs, though the model is limited to a single instance.
I have noticed that if I added up the GPU memory of both GPUs from csv, then divide by 2, I (470.8 + 1592.8) / 2 = 1031.8, i'm getting near the pdf result. Could be a coincidence?
Hi @KimiJL, sorry for the slow response. I just returned from vacation.
I suspect that your observation is not a coincidence and that there is a bug. We will have to investigate further.
May I ask, were you running in local mode? Or docker or remote?
Hi @tgerdesnv thanks for the response,
I was running in in --triton-launch-mode=docker
@KimiJL I have confirmed that the values in the pdfs are in fact the averages across the GPUs. The values in metrics-model-gpu.csv are the raw values per-gpu. So, in your case, the total maximum memory usage by the model on your machine would be 470.8 + 1592.8
I will fix Model Analyzer to show total memory usage, or clarify the labels to indicate that it is average memory usage.
@tgerdesnv great, thank you for the clarification, that makes sense!