Matt Camp
Matt Camp
I have an issue when using cluster autoscaling for GPU nodes. I am using Karpenter as the cluster autoscaler and I'm trying to deploy NVidia Riva. The pod deployment spec...
This PR adds the ability to push training and evaluation metrics to InfluxDB (via Telegraf). When combined with https://github.com/aws-deepracer-community/deepracer-for-cloud/pull/159 it should allow for some nice interactive Grafana dashboards. As the...
This PR adds a docker-compose stack which launches three additional services - Telegraf to accept UDP push metrics and pass to InfluxDB - InfluxDB to store time-series metrics - Grafana...
The feature enables each training session to have it's config and custom_files stored within a subdir under `experiments/`. This simplifies being able to locate the config and files that were...