amazon-ecs-ami
amazon-ecs-ami copied to clipboard
Add Support for dlami-cloudwatch-agent in ecs-ami for GPU
DESCRIPTION:
We are currently utilizing the amzn2-ami-ecs-gpu-hvm-2.0.20240109-x86_64-ebs
AMI for our ECS instances. However, we have observed that this AMI lacks support for the dlami-cloudwatch-agent, a crucial component present in the DLAMI (Deep Learning AMI GPU TensorFlow 2.12.0 (Ubuntu 20.04) 20230529
).
Our specific requirement is to publish GPU utilization metrics to CloudWatch using the dlami-cloudwatch-agent. This capability is essential for monitoring and optimizing our GPU resources effectively.
EXPECTED BEHAVIOR:
We request an update to the amzn2-ami-ecs-gpu-hvm-2.0.20240109-x86_64-ebs
AMI to include support for the dlami-cloudwatch-agent. This addition will enable us to seamlessly integrate GPU utilization metrics into our CloudWatch monitoring infrastructure.
ADDITIONAL CONTEXT:
-
Current State: The dlami-cloudwatch-agent is present in DLAMI but absent in the mentioned ECS AMI.
-
Use Case: Our use case involves closely monitoring GPU utilization for better resource management and performance optimization.
IMPACT:
This enhancement will benefit users relying on the amzn2-ami-ecs-gpu-hvm-2.0.20240109-x86_64-ebs
AMI, enabling them to leverage CloudWatch for comprehensive GPU monitoring.
Hi @mostafafarzaneh, are you able to install this package in userdata on instance startup? I'd be reticent to add this package to the default AMI since opting all customers into additional cloudwatch metrics would lead to increased billing for metrics that they may not use or need.