dstack icon indicating copy to clipboard operation
dstack copied to clipboard

[Feature]: Support dstack volumes

Open r4victor opened this issue 1 year ago • 0 comments

Problem

Currently, dstack lacks built-in functionality that would allow users to persist data between runs. Cloud providers usually provide data persistence via network volumes. The proposal is to introduce volumes to dstack.

  • There should be a way to register an existing volume with dstack such as an existing EBS volume.
  • Users may not need to create volumes themselves, so dstack should be able to create and manage volumes.

Solution

[!IMPORTANT] Below is how this feature is planned to be implemented now.

First, there is a command to add an existing volume to dstack:

dstack volume register \
	--name my-volume \
	–-backend aws \
	--type ebs \
	--region eu-west-1 \
	--volume-id ebs-volume-id \

After the volume is added, it can be mounted in a run:

type: task
commands:
  - ...
volumes:
  - name: my-volume
    path: /my_data

dstack will try to provision the instance in the backend/region of that volume. In case of no availability, the run will fail – other backends/regions won’t be tried since the specified volume cannot be mounted there.

Future plans

[!IMPORTANT] Below are some thoughts on how this feature can evolve in the future.

Creating volumes via CLI

Besides dstack volume register, there will be a command to add new volumes created by dstack:

dstack volume add \
	--name my-volume \
	--type aws-ebs \
	--region eu-west-1
	--size 500GB \
	--volume-type gpt

The volume support can be implemented for AWS, Azure, GCP. It can also be implemented for other clouds but with certain restrictions (e.g. at most one mount per run).

We'll begin by supporting registering existing volumes (dstack volume register) and only supporting AWS EBS volumes. Then, we can implement dstack volume add and support more backends.

Creating and registering volumes via YAML

An alternative to CLI volume management that we're not implementing at the moment is to define volumes directly in run configurations:

type: task
commands:
  - ...
mounts:
  - volume: dstack-volume
    path: /dstack_data
  - volume: user-volume
    path: /user_data
volumes: # the volume will be created by dstack if it does not exist
  - name: dstack-volume
    spec:
      size: 500GB
external_volumes: # the existing user created volume is used
  - name: user-volume
    type: aws-ebs
    region: eu-west-1
    volume_id: ebs-volume-id

Workaround

No response

Would you like to help us implement this feature by sending a PR?

Yes

r4victor avatar Apr 23 '24 09:04 r4victor