dstack icon indicating copy to clipboard operation
dstack copied to clipboard

[Feature]: Local disks

Open jvstme opened this issue 1 year ago • 3 comments

Context

Many cloud providers bundle one or several (e.g. 16) local disks with some instance types.

Local disks have these traits:

  • Physically attached to the host and hence provide better performance
  • Included in the instance price, no way to opt out
  • Provided in addition to the main OS disk
  • Typically do not have a file system
  • Not persistent, the data typically survives instance restarts, but is lost when the instance stops
  • Storage capacity is fixed and may vary depending on the instance type

Here is how some cloud providers implement local disks:

  • AWS provides Instance Store that is opt-in for some instance types but always bundled with others. Capacity varies from ~60 GB to ~336 TB
  • Azure provides one Temporary Disk with most instance types. Capacity is not documented (?) but apparently varies from 16 GB to several terabytes
  • GCP provides Local SSDs that are opt-in for some instance types but always bundled with others. Capacity varies from 375 GB to 36 TB
  • OCI provides Local Disks with some instance types. Capacity varies from ~4TB to ~80TB
  • Vultr provides multiple disks for all bare metal instances.
  • Digital Ocean (and possibly AMD Developer Cloud) provide scratch disks for some GPU droplets.

Problem

dstack ignores local disks, so dstack users cannot benefit from their performance and capacity, even though they pay for them.

Possible solutions

Solution 1 — create an LVM volume over the local disks and use it as docker's data_root

This way, users will benefit from local disks' performance automatically, no configuration or special handling is needed. However, if users request more disk capacity in the run configuration than local disks have to offer, dstack will have to store data_root on the OS disk as usual, i.e. the local disks will still be ignored.

Solution 2 — create an LVM volume over the local disks and mount it to a directory within the container

This way, users will be able to use both the fixed local disks and the configurable OS disk, i.e. have flexible disk capacity. However, the environment will be different on instances with and without local disks, so users' code will have to be adjusted to use local disks.

jvstme avatar Jun 24 '24 12:06 jvstme

I think Solution 2 is perfectly fine as long as the local disks' mount point is clearly documented.

r4victor avatar Jun 24 '24 12:06 r4victor

This issue is stale because it has been open for 30 days with no activity.

peterschmidt85 avatar Aug 05 '24 01:08 peterschmidt85

This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.

github-actions[bot] avatar Feb 14 '25 01:02 github-actions[bot]