dstack icon indicating copy to clipboard operation
dstack copied to clipboard

dstack is an open-source alternative to Kubernetes, designed to simplify development, training, and deployment of AI across any cloud or on-prem. It supports NVIDIA, AMD, and TPU.

Results 325 dstack issues
Sort by recently updated
recently updated
newest added

### Problem These service configuration properties only affect how the proxy handles requests to the service: - `port` - `model` - `https` - `auth` - `rate_limits` Although these properties do...

feature
gateways
no-stale

### Problem Currently, Dstack only supports using `pd-balanced` GCE Disks. For higher performance persistent storage on GCP, we want the ability to specify the underlying disk type (e.g. `hyperdisk-balanced`). ###...

feature
no-stale

### Problem Currently the` env:` configuration does not support variable interpolation. This means that when we define environment variables like: ``` env: - NUM_SHARD=$DSTACK_GPUS_NUM ``` The value is not evaluated...

feature
no-stale

Steps to reproduce: 1. Create an SSH fleet with one or more hosts 2. Disable the connection between the `dstack` server and the hosts of the fleet 3. Wait for...

ux
ssh-fleets
no-stale

Following #2455 A significant part of the gateway provisioning time is dependencies installation. Moving to uv would reduce it to a few seconds. For new gateways, uv can be installed...

gateways
no-stale
enhancement

### Problem It can take several minutes (10+) to pull the Docker image to the runner instance depending on the image size and network speed. Currently, `dstack` only shows that...

feature
ux
no-stale

### Problem Currently, both `dstack server` and `dstack` CLI require direct SSH access to VMs. Similar to #2554 for GCP, AWS recommends [using SSH via Session Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-getting-started-enable-ssh-connections.html) to enforce the...

feature
no-stale

Currently, we require the user to specify `image` always when using AMD. It would be cool if we provide a small and up-to-date AMD image with ROCm drivers.

amd
no-stale

### Problem When applying new SSH fleets, I faced a "Provisioning timeout expired" with no further information. I had to restart the server using `--log-level=DEBUG`. This is okay, when I...

feature
major
troubleshooting
no-stale

### Problem To follow security best practices we need to lock down SSH access to GCP VMs to only IAP authorized users. To do this, we can limit the source...

feature
major