[Feature]: Support Runpod network volumes
Problem
In order to get the most out of Runpod deployments, it would be amazing to have support for things like network storage, selecting specific data centers, or specifying a template_id that includes a lot of the existing configuration.
The create_pod [link] function in the api_client of the runpod backend accepts these parameters, namely template_id, data_center_id, network_volume_id, however when defined in a configuration, e.g. as example.dstack.yml:
type: task
spot_policy: auto
template_id: runpod-torch-v21
data_center_id: EU-RO-1
backends: [runpod]
dstack run . -f example.dstack.yml fails with:
3 validation errors for RunConfigurationRequest
__root__ -> TaskConfigurationRequest -> data_center_id
extra fields not permitted (type=value_error.extra)
__root__ -> TaskConfigurationRequest -> template_id
extra fields not permitted (type=value_error.extra)
__root__ -> TaskConfigurationRequest -> __root__
Either `commands` or `image` must be set (type=value_error)
There are 2 problems with this:
- It appears the configuration values such as
template_id, data_center_id, network_volume_idare not picked up as valid variables. - On a philosophical level there's a question if
imageorcommandshould be required to be defined in the dstack task itself if a runpod template is used (i.e., there is atemplate_idreference), as that template will already define the image and command. My biased view is that the template should override what's in the dstack configuration, but I think either way it's workable so it has little practical importance and might more come down to what's more suitable according to the principles of the dstack architecture.
Having support for (1) would be incredibly helpful as it enabled network volume usage on runpod which enables usage of dstack for large(r) scale deployments where downloading remote models for each instance is too expensive.
Solution
Add support for runpod variables to the dstack configuration. Pass those variables to the runpod backend and the create_pod function.
Workaround
None to my knowledge, but I recognize there's an open issue for general volume support https://github.com/dstackai/dstack/issues/1158 which would alleviate some of these pains. However, having support for these configuration variables in general seems like a quick win to increase runpod adoption
Would you like to help us implement this feature by sending a PR?
No
@dinosaursarecool Thank you very much for the request. Here's a few questions that may help us move forward with this:
data_center_id. AFAIK,dstacksupports this viaregions:
type: task
spot_policy: auto
regions: [EU-RO-1]
backends: [runpod]
network_volume_idthis feature is planned as a part of https://github.com/dstackai/dstack/issues/1158
First, we'll support AWS and GCP and after that we're also happy to support RunPod too!
- As to
template_id, is there anything that you needtemplate_idwhatdstackdoesn't support? I wonder why you many nee to use templates? You can specify everything viacommandsand you repo files. Please let me know!
@peterschmidt85 Thanks, got it. Yeah I think everything should be achievable through the current dstack configuration except for volumes. So if volume support is solved in #1158 then I can see how we could consider template support to be superfluous
@dinosaursarecool Don't mind if we update the title/description of this issue to focus on just volumes with RunPod?
@peterschmidt85 absolutely, updated the title
This issue is stale because it has been open for 30 days with no activity.
@dinosaursarecool, the support for runpod network volumes is in master. Give it a try! It will be coming in the next 0.18.7 release within two weeks.