hydra
hydra copied to clipboard
IAC for AWS Batch Infra
Create terraform scripts to launch aws batch infra for training. Import components
- Separate queues and compute environments for CPU vs GPU workloads
- Use launch templates to attach additional disk storage to training instances
- Alerting for jobs queuing up in Batch states
- Slack alerting configuration
- FIne grained IAM perms