dstack
dstack copied to clipboard
dstack is an open-source alternative to Kubernetes, designed to simplify development, training, and deployment of AI across any cloud or on-prem. It supports NVIDIA, AMD, and TPU.
### Steps to reproduce After moving around examples, links from Clusters examples to github are broken: https://github.com/dstackai/dstack/blob/a444b846e51a5b025c0aed081a65d6b5296508fa/examples/clusters/a3high/README.md#L229 Also the examples themselves are not reproducible since the paths to configurations are...
### Steps to reproduce 1. Run `dstack offer --group-by gpu,backend`. 2. Observe that it shows all REGIONS that backend has for each GPU even if this GPU only available for...
### Problem User should be able to see everyone project and everyone own roles for it  ### Solution - [ ] Add column Role on user project list page...
### Steps to reproduce Apply the configuration: ```yaml type: service image: nginx port: 80 replicas: 0..1 scaling: metric: rps target: 1 ``` ### Actual behaviour Until the first request hits...
Currently, dstack Models work with Chat Completions API, but since March OpenAI has introduced the Responses API ([migration guide](https://platform.openai.com/docs/guides/migrate-to-responses)). OpenAI says that the Chat Completions API will not be deprecated,...
### Problem If a spot instance is interrupted, `dstack` will only detect the interruption after a period of instance being unreachable: - If the job is running, it will be...
### Problem It would be nice if we had the ability to simply restart a run ### Solution If given the run name, get the last run ID and restart...
We recently debugged a case when running multiple server replicas led to high DB load, many active DB sessions, and extremely slow DB queries. This turned out to be caused...
Learn more: https://cloud.google.com/blog/products/compute/introducing-dynamic-workload-scheduler I suppose we need to support both flex and calendar modes.
### Problem Updating any fleet properties requires stopping all runs that are using the fleet and recreating it, including recreating the underlying VMs in cloud fleets. ### Solution Support in-place...