MLOS
MLOS copied to clipboard
ARM template auto-shutdown and cleanup helper scripts
trafficstars
Often times we will run with teardown: false and tweak experiments for different variations. This means that after a while we may have stale resources running and eating resources in the cloud (Azure).
To fix that I propose:
- [ ] Simply enable an auto-shutdown policy that somehow (*) detects idle resources and shutsdown the VM.
That could help with compute costs, but not allocation costs.
- [ ] (*) Not sure if the native version does this or if it's just a schedule, in which case we'd have to write something to run on the VM
- [ ] Could also configure the ARM templates to do cascading deletion of certain resources, but probably not all (e.g., shared storage, shared vnets, shared db, etc.)
- [ ] We need a script that could look at a shared database instance for all experiment definitions, and can determine which resources may no longer be necessary, and propose deletions.
- [ ] We might need to add additional metadata to help track that.