xgboost-operator icon indicating copy to clipboard operation
xgboost-operator copied to clipboard

How to run distributed training from Kubeflow Pipelines SDK?

Open marrrcin opened this issue 5 years ago • 3 comments

The example linked in README https://github.com/kubeflow/xgboost-operator/tree/master/config/samples/xgboost-dist shows that spawning distributed training job requires running kubectl. I want to run distributed XGBoost training as a part of bigger Kubeflow pipeline, how to achieve this? Is there a possibility to spawn distributed job from the Python code itself or from the Kubeflow Pipelines SDK?

marrrcin avatar Feb 24 '20 10:02 marrrcin

Issue-Label Bot is automatically applying the labels:

Label Probability
question 0.86

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

issue-label-bot[bot] avatar Feb 24 '20 10:02 issue-label-bot[bot]

I think you can run XGBoostJob as part of Kubeflow Pipelines similar to other Kubeflow operators but I am not familiar enough with Kubeflow Pipelines to be sure. Try it out and let us know if you encounter any issues.

terrytangyuan avatar Feb 24 '20 15:02 terrytangyuan

I think it' related to https://github.com/kubeflow/pipelines/issues/973

pingsutw avatar Jun 21 '20 12:06 pingsutw