amazon-ecs-ami icon indicating copy to clipboard operation
amazon-ecs-ami copied to clipboard

Add Support for dlami-cloudwatch-agent in ecs-ami for GPU

Open mostafafarzaneh opened this issue 1 year ago • 1 comments

DESCRIPTION:

We are currently utilizing the amzn2-ami-ecs-gpu-hvm-2.0.20240109-x86_64-ebs AMI for our ECS instances. However, we have observed that this AMI lacks support for the dlami-cloudwatch-agent, a crucial component present in the DLAMI (Deep Learning AMI GPU TensorFlow 2.12.0 (Ubuntu 20.04) 20230529).

Our specific requirement is to publish GPU utilization metrics to CloudWatch using the dlami-cloudwatch-agent. This capability is essential for monitoring and optimizing our GPU resources effectively.

EXPECTED BEHAVIOR:

We request an update to the amzn2-ami-ecs-gpu-hvm-2.0.20240109-x86_64-ebs AMI to include support for the dlami-cloudwatch-agent. This addition will enable us to seamlessly integrate GPU utilization metrics into our CloudWatch monitoring infrastructure.

ADDITIONAL CONTEXT:

  • Current State: The dlami-cloudwatch-agent is present in DLAMI but absent in the mentioned ECS AMI.

  • Use Case: Our use case involves closely monitoring GPU utilization for better resource management and performance optimization.

IMPACT:

This enhancement will benefit users relying on the amzn2-ami-ecs-gpu-hvm-2.0.20240109-x86_64-ebs AMI, enabling them to leverage CloudWatch for comprehensive GPU monitoring.

mostafafarzaneh avatar Jan 30 '24 06:01 mostafafarzaneh

Hi @mostafafarzaneh, are you able to install this package in userdata on instance startup? I'd be reticent to add this package to the default AMI since opting all customers into additional cloudwatch metrics would lead to increased billing for metrics that they may not use or need.

sparrc avatar Apr 19 '24 22:04 sparrc