model_analyzer icon indicating copy to clipboard operation
model_analyzer copied to clipboard

Model Analyzer - NO Gpu Metrics collection

Open aptmess opened this issue 10 months ago • 1 comments

Description

Good day! I have the same issue as it was written here

I am using Model Analyzer and while running model-analyzer profile script the output says that there is no gpu utilization metrics:

 WARNING: Server output field "gpu_utilization", has no data
No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.

In reports there are also zero metrics for GPU - tried on A800 and A100 GPU's (for the last one I expect this behavior due to known issues, but for A800 - no). How it can be fixed?

Triton Information

I am using triton nvcr.io/nvidia/tritonserver:24.03-py3-sdk.

To Reproduce

  1. Start SDK container:
mkdir -p profiling
mkdir -p profiling/my_model
mkdir -p profiling/my_model/analyzer
mkdir -p profiling/my_model/profile_results
docker pull nvcr.io/nvidia/tritonserver:24.03-py3-sdk
docker run -it --rm \
    -v <full_path_to_model_repository>:<full_path_to_model_repository> \
    -v ./profiling/:/profiling/ \
    -v /var/run/docker.sock:/var/run/docker.sock \
    --net=host \
    --gpus=0 \
    nvcr.io/nvidia/tritonserver:24.03-py3-sdk
  1. Create analyzer.yaml file inside container:
model_repository: <full_path_to_model_repository>

run_config_search_max_instance_count: 8
run_config_search_min_model_batch_size: 1
run_config_search_min_concurrency: 1
run_config_search_max_concurrency: 1024
profile_models:
  - my_model
  1. Start model-analyzer:
model-analyzer profile \
  --triton-launch-mode=docker  \
  --output-model-repository-path /profiling/my_model/analyzer/ \
  --export-path /profiling/my_model/profile_results/ \
  --override-output-model-repository  
  --triton-docker-image=nvcr.io/nvidia/tritonserver:24.03-py3 \
  --client-protocol=http \
  -f analyzer.yaml  \
  --always-report-gpu-metrics

Output:

[Model Analyzer] WARNING: Server output field "gpu_used_memory", has no data
[Model Analyzer] WARNING: Server output field "gpu_utilization", has no data
[Model Analyzer] WARNING: Server output field "gpu_power_usage", has no data
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_power_usage' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_power_usage' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_power_usage' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_power_usage' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_power_usage' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_power_usage' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_power_usage' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_power_usage' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_power_usage' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_used_memory' found in the model's measurement. Possibly comparing measurements across devices.
[Model Analyzer] WARNING: No GPU metric corresponding to tag 'gpu_utilization' found in the model's measurement. Possibly comparing measurements across devices.

And in the summary report:

image

Expected behavior

GPU metrics would collect and this metrics would be shown in summary:

Thanks in advance for checking and fixing this issue!

aptmess avatar Apr 18 '24 10:04 aptmess

Hi @aptmess, I'm sorry to hear you are running into an issue where no memory is being reported. We will have to investigate further to try to figure out why no memory is being reported from the server.

Are you able to run nvidia-smi at the same time to confirm that the model is in fact running on the GPU and that the GPU has non-zero utilization and memory usage?

tgerdesnv avatar Apr 18 '24 17:04 tgerdesnv

Closing due to inactivity.

tgerdesnv avatar May 13 '24 15:05 tgerdesnv