kubeflow icon indicating copy to clipboard operation
kubeflow copied to clipboard

Kubeflow component integration with ML Metadata

Open richardsliu opened this issue 3 years ago • 9 comments

/kind feature

Why you need this feature: Kubeflow currently doesn't have a unified metadata/artifact management story beyond what's supported in KFP. For example, the concept of a "ML experiment" exists in training and hyperparameter tuning, but there is no way to track it across separate Kubeflow components. Having unified metadata tracking allows users to aggregate things like:

  • Experiment runs
  • Datasets
  • Metrics
  • Trained artifacts
  • Hyperparameter configurations
  • etc

Originally Kubeflow covered this through the Metadata project but it has since been archived. There were some additional discussions around this, found in issue #4955.

It would be great to revisit this problem and see if we can propose a unified interface for metadata and artifact storage, possibly by using ML metadata.

Describe the solution you'd like:

One problem with the original Kubeflow metadata project is that it comes with its own storage backend using MySQL, which makes it heavy-weight. We do not need to re-implement the storage backend since MLMD already solves that problem. Instead, we can make MLMD an optional installation, and write to it directly. This is what KFP is currently doing, see this link for the code.

If we can define a unified data model and interface, it should be possible to build a light-weight library on top of ML metadata. It can be an optional import for training jobs and hyperparameter tuning jobs.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

richardsliu avatar Apr 11 '22 22:04 richardsliu

/priority p1 /kind feature

jbottum avatar Apr 11 '22 23:04 jbottum

Thank you Richard for your proposal.

I think it will be beneficial if more Kubeflow components want to adopt MLMD. The questions I have are:

  1. Are we looking for a mechanism to only group objects across different Kubeflow components? Do we provide a mechanism for Kubeflow components to consume MLMD?
  2. How to guarantee the MLMD version consistency across Kubeflow components?

Note: MLMD has become the hard dependency of KFP in KFPv2. We are not only writing to MLMD, we are also reading MLMD for status update.

Note: Can external addon also use MLMD? For example: Can KServe also use MLMD? If so, how to design a client which can be adopted by Kubeflow components and addon?

Note: Once we have a proper proposal, you can make use of https://github.com/kubeflow/community/tree/master/proposals by creating a PR to this folder.

zijianjoy avatar Apr 12 '22 03:04 zijianjoy

This will be a great value add

johnugeorge avatar Apr 12 '22 06:04 johnugeorge

i think it is very dangerous because MLMD is not yet separated per namespace https://github.com/kubeflow/pipelines/issues/4790. It will lower the security standards even more if more components break down the namespace isolation.

juliusvonkohout avatar Apr 12 '22 15:04 juliusvonkohout

In general I think this is a great proposal. This to me has been one of the bigger gaps in Kubeflow ever since the previous attempt was archived. There's details to be worked out as @zijianjoy and @juliusvonkohout mention, but they're not impossible.

What other requirements do people envision needed for this? I agree with @juliusvonkohout that whatever we do it should at least have an option for user isolation. Whether it is completely isolated or we maintain two stores (one shared and one namespaced) is debatable. I believe @zijianjoy had some good comments about that and maintaining backward compatibility.

ca-scribner avatar Apr 12 '22 16:04 ca-scribner

The ability to track experiments' metadata in a centralized place dedicated to such aims would be great 👍 Atm I'm not sure what to use for such audit/governance/tracking activities without the help of external tools. The same time I don't even want to try a mix of KFL and MLFlow since it looks to me like a over-engineering in case there could be built-in functionality for it

rustam-ashurov-mcx avatar May 27 '22 14:05 rustam-ashurov-mcx