katib Data passing to Katib experiments as components of kubeflow pipelines.

/kind feature

Describe the solution you'd like Enable data passing into a Katib component, from previous and/or to later kubeflow pipeline components without the use of persistent volumes, to be able use it as part of a portable Kubeflow pipeline (KFP). This way, the Katib-based component will maintain an isolated functionality namely to do the hyper-parameter tuning, leaving data loading and processing to previous components. This would allow to automate the entire training flow and capitalise on other features of KFP, such as caching. An example of how a desired pipeline would look like can be found in the attached pdf.

Currently, the only way to implement this is by using some persistent volume, but this dampens the portability of the pipeline. It is beneficial to be able to directly pass data to a Katib KFP components, even more if the data passing abides by the KFP v2 data passing mechanism

Anything else you would like to add:

Love this feature? Give it a 👍 We prioritize the features with the most 👍 KFP.pdf

Apr 11 '22 18:04 Efthymios-Stathakis

What is the current status for integrating Katib into a Kubeflow pipeline?

I've noted issues 1846 and 1914 are along the same lines, but just wanted to know what the current best options are.

I have a train pipeline that involves components for (a) fetching data (b) preprocessing the data and (c) training. I would like to do hyperparameter tuning over the train component. From a first look at the Katib documentation it appears not to have a native integration with Pipelines: you specify a container/command that does the training and fire up your katib experiment, but it doesn't appear that it can itself be a component without wrapping it somehow.

It seems my main options are:

Have the container specificed in my Katib YAML actually orchestrate the whole pipeline
Create a Pipeline component that runs Katib that runs the train script.
Have my Pipeline only do the data preparation and then separately run Katib.

The first two seem overly complicated. The third approach seems the most natural, but to some extent undermines the point of using a pipeline in the first place, since the pipeline only strings together data downloading and preprocessing. It would be good to have a pipeline where the input is some data source and the final output is the best model from a hyperparameter tuning experiment.

Originally posted by @oadams in https://github.com/kubeflow/katib/issues/331#issuecomment-1221693899

Aug 22 '22 01:08 oadams

See this https://github.com/kubeflow/manifests/tree/master/tests/e2e

Katib experiments are run using Katib launcher and then results are used in TFjob launch

Aug 22 '22 06:08 johnugeorge

Hi,

This solution uses a VolumeOp which makes the pipeline less portable. Ideally, the Katib component could get the data from the previous step in the same manner that Kubeflow pipelines implements data passing from one component to another.

Aug 22 '22 07:08 Efthymios-Stathakis

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sep 07 '23 20:09 github-actions[bot]

/lifecycle frozen

Sep 07 '23 23:09 tenzen-y

katib katib copied to clipboard

Data passing to Katib experiments as components of kubeflow pipelines.

Anything else you would like to add:

katib
katib copied to clipboard