common
common copied to clipboard
[FeatureRequest] Support dynamic volume provisioning for distributed training job
This is ported from https://github.com/kubeflow/tf-operator/issues/949 as we want to make this feature generic. Basically we want to add volumeClaimTemplate
to job spec so the workers are able to use customized volumes for scratch space.
/assign
Issue-Label Bot is automatically applying the label feature_request
to this issue, with a confidence of 0.98. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!
Links: app homepage, dashboard and code for this bot.
@zhan849 May I ask is there any progress about this feature?
@xiaogaozi thanks for following up - unfortunately I've got little bandwidth in Q3 to work on this given several high-priority items I need to focus on for my company. I will likely to get back to this in Q4. Is this blocking any release? If so, I'm ok if someone else take over the impl for the feature.
Apologize about the inconvenience
We (Xiaohongshu) rely on this feature recently, it's better if someone could continue the contribution.
Just FYI, Volcano job controller is working on PVC per pod feature: https://github.com/volcano-sh/volcano/pull/703