aws-parallelcluster-monitoring
aws-parallelcluster-monitoring copied to clipboard
Monitoring Dashboard for AWS ParallelCluster
Hi, building prometheus-slurm-exporter binary I found this error at head node post-install: ``` cd /home/ec2-user/aws-parallelcluster-monitoring/prometheus-slurm-exporter; git status --porcelain fatal: detected dubious ownership in repository at '/home/ec2-user/aws-parallelcluster-monitoring/prometheus-slurm-exporter' To add an exception...
Does this work with pcluster version 3.6?
New GPU instance types added to the AWS instance family *Issue #, if available:* *Description of changes:* Modified prometheus/prometheus.yml to add p4de.24xlarge and p5.48xlarge. These are new GPU instance types...
*Issue #, if available:* *Description of changes:* By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Great solution, however, IAM permissions required provide a significant level of access to the head & compute nodes restricting the ability to deploy the solution into certain environments due to...
Cron daemon generates an email for each upload (every minute). Send crontab to /dev/null or a log file. ``` Message 44: From [email protected] Tue Aug 17 05:38:03 2021 Return-Path: Date:...
https://aws.amazon.com/grafana/ https://aws.amazon.com/prometheus/ aggregate multiple clusters into single managed grafana instance
FSx cost does not work when the cluster is deployed with a pre-existing FSx. The issue is [here](https://github.com/aws-samples/aws-parallelcluster-monitoring/blob/main/custom-metrics/1h-cost-metrics.sh#L60)