common icon indicating copy to clipboard operation
common copied to clipboard

[FeatureRequest] Support dynamic volume provisioning for distributed training job

Open zhan849 opened this issue 5 years ago • 5 comments

This is ported from https://github.com/kubeflow/tf-operator/issues/949 as we want to make this feature generic. Basically we want to add volumeClaimTemplate to job spec so the workers are able to use customized volumes for scratch space.

/assign

zhan849 avatar Apr 19 '19 16:04 zhan849

Issue-Label Bot is automatically applying the label feature_request to this issue, with a confidence of 0.98. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

issue-label-bot[bot] avatar Apr 19 '19 16:04 issue-label-bot[bot]

@zhan849 May I ask is there any progress about this feature?

xiaogaozi avatar Aug 21 '19 04:08 xiaogaozi

@xiaogaozi thanks for following up - unfortunately I've got little bandwidth in Q3 to work on this given several high-priority items I need to focus on for my company. I will likely to get back to this in Q4. Is this blocking any release? If so, I'm ok if someone else take over the impl for the feature.

Apologize about the inconvenience

zhan849 avatar Aug 21 '19 04:08 zhan849

We (Xiaohongshu) rely on this feature recently, it's better if someone could continue the contribution.

xiaogaozi avatar Aug 21 '19 06:08 xiaogaozi

Just FYI, Volcano job controller is working on PVC per pod feature: https://github.com/volcano-sh/volcano/pull/703

xiaogaozi avatar Apr 20 '20 08:04 xiaogaozi