azhpc-images icon indicating copy to clipboard operation
azhpc-images copied to clipboard

Bake Node Exporter and DCGM Exporter into Azure HPC Images under /opt/azurehpc/monitoring

Open Daramfon10 opened this issue 8 months ago • 1 comments

Background: As part of our observability pipeline for Azure HPC VMs, we are leveraging the Azure Monitor Agent (AMA) to scrape and publish telemetry. We rely on exporters to collect this telemetry. However, AMA does not currently bundle exporters (e.g., Node Exporter, DCGM Exporter) directly.

To simplify the startup process and ensure consistency across nodes, we propose embedding the exporters into the image itself. This ensures that exporters are always present at a known path (/opt/azurehpc/monitoring/) and AMA can reliably start them via docker run or by executing local binaries.

This approach reduces runtime dependencies, speeds up telemetry availability, and reduces the risk of runtime failures during scale-out caused by image pull delays, network issues, or inconsistencies in exporter versions and file paths across nodes.

Ask: We would like to request that both the Node Exporter and DCGM Exporter directories be pre-baked into the Azure HPC images under - /opt/azurehpc/monitoring. Here is the link to both repositories:

  • Node exporter: https://github.com/prometheus/node_exporter/tree/master
  • DCGM exporter: https://github.com/NVIDIA/dcgm-exporter

Daramfon10 avatar Jun 11 '25 18:06 Daramfon10

Why not bundle this in AMA agent? Not every user uses our HPC image.

arsdragonfly avatar Jun 16 '25 18:06 arsdragonfly