dstack
dstack copied to clipboard
[Meta] Improve `kubernetes` backend
Essential:
- [x] Request resources according to the
dstackconfiguration - [x] Multi-node support (distributed tasks running on fleets with cluster placement)
Strategic:
- [x] AMD GPUs support
- [ ] Allow to configure multiple clusters per backend (e.g. per
region) - [ ] Auto-scaling support (ideally, find a way to support it for any clouds)
Improvements:
- [x] Update the jump pod: use a lightweight image, restrict SSH access (see TODOs in
_create_jump_pod_service) - [x] Test and update (if required) the gateway functionality on managed/self-hosted Kubernetes other than EKS (see TODO in
KubernetesCompute.create_gateway)