Matt Camp

Results 6 issues of Matt Camp

I have an issue when using cluster autoscaling for GPU nodes. I am using Karpenter as the cluster autoscaler and I'm trying to deploy NVidia Riva. The pod deployment spec...

lifecycle/stale

This PR adds the ability to push training and evaluation metrics to InfluxDB (via Telegraf). When combined with https://github.com/aws-deepracer-community/deepracer-for-cloud/pull/159 it should allow for some nice interactive Grafana dashboards. As the...

This PR adds a docker-compose stack which launches three additional services - Telegraf to accept UDP push metrics and pass to InfluxDB - InfluxDB to store time-series metrics - Grafana...

The feature enables each training session to have it's config and custom_files stored within a subdir under `experiments/`. This simplifies being able to locate the config and files that were...