community icon indicating copy to clipboard operation
community copied to clipboard

Proposal: ML Experience Working Group

Open ederign opened this issue 10 months ago • 25 comments

Hi everyone! 👋

After brainstorming with some community members about how to improve the Kubeflow User/Developer Experience for Data Scientists and ML practitioners, I decided to go one step further and start a formal discussion and propose a new IDE working group and its initial roadmap.

The IDE Working Group (potentially, Kubeflow Jupyter Extension WG) will be responsible for developing and integrating IDE-based tools and extensions to provide a streamlined user experience to data scientists and machine learning practitioners on Kubeflow.

WG IDE Charter

The IDE Working Group is responsible for developing and integrating IDE-based tools and extensions to provide a streamlined user experience to data scientists and machine learning practitioners on Kubeflow.

This charter adheres to the conventions, roles, and organization management outlined in wg-governance.

Scope

The IDE Working Group focuses on developing, maintaining, and improving tools and extensions that support data science and machine learning practitioners workflows within Kubeflow. The group is dedicated to delivering a high-level, seamless experience integrated with the IDE of choice across multiple Kubeflow components.

In scope

Code, Binaries, and Services

  1. Development of Kubeflow JupyterLab extensions that provide simple abstractions and UX to interact with the most common Kubeflow components (e.g., pipelines, hyperparameter tuning) and shorten the time to value for practitioners comfortable with Jupyter. These extensions will focus on the most used Kubeflow components, such as:

    • Pipelines;
    • Training Operator & Katib;
    • Model Registry;
    • Model Serving (KServe);
    • Feast
  2. Promote the reusability of UI components from other Kubeflow UIs into the IDE (e.g., rendering a pipeline graph inside the JupyterLab environment) by establishing a shared contract between the IDE WG and the wider Kubeflow community. 

  3. Develop a Python SDK to simplify operationalization across Kubeflow components and provide a "one-stop-shop" for practitioners who want easy access to Kubeflow services. The SDK also provides the groundwork for the IDE extension automation and workflows.

    • Create a single installation and configuration layer for users interacting programmatically with the Kubeflow ecosystem via SDKs.
    • The "common" SDK is not meant to replace individual components' SDKs but rather to offer a unified access layer to simplify dependency management and shared configuration (like authorization).

Guiding Principles

  • Synergy among Kubeflow Working Groups: Collaborate with other WG to promote reusability of UI components from other Kubeflow UIs to create a single UX between Jupyter IDE and Kubeflow Central Dashboard;
  • Collaboration with other open-source IDE projects (like Jupyter and VSCode) to promote the creation and reusability of open standards for AI/ML tools (protocols, communication exchange, file formats, etc.) and plugins. The aim of this group is to actively participate in the development of these standards to include Kubeflow in a broader ecosystem or interoperable tools. 

Cross-cutting and Externally Facing Processes

  • Collaboration with other Kubeflow WGs, including WG Notebooks, WG Pipelines, WG Training, and WG Serving, ensures that IDE tools are interoperable across different stages of the ML lifecycle.
  • Coordination with the release teams to align updates in IDE tools with broader Kubeflow release schedules.

Out of scope

  • Building and maintaining Notebook/Workspaces images (this falls under the WG Notebooks).

Working Group Roadmap Proposal

Vision

Development of Kubeflow JupyterLab extensions that provide simple abstractions and UX to interact with the most common Kubeflow components (e.g., pipelines, hyperparameter tuning) and shorten the time to value for practitioners comfortable with Jupyter. These extensions will focus on the most used Kubeflow components, such as Pipelines, Training Operator & Katib, Model Registry, Model Serving (Kserve), Feast, etc.

Phase 1 - Establish baseline (XX Months)

Goal: Baseline/starting point for Kubeflow IDE Extension

This phase will consist of three main tasks:

  • Working on the kubeflow-kale/kale to make it functional with KFP v2. The goal is to demo a successful notebook run with the latest version of KFP.
  • Re-introduce Elyra add-on support in Kubeflow. The goal is to demo a pipeline visual authoring compatible with the latest version of KFP.
  • Explore the synergy between the Kubeflow Jupyter Extension and Jupyter Scheduler. We strive to build a close partnership of this working group with Jupyter upstream and even conciliate our efforts.

Task breakdown:

Kale: Note: @StefanoFioravanzo started this issue https://github.com/kubeflow/community/issues/730 and got great feedback and traction from the community.

  • Create a map of existing features and capabilities.
  • Upgrade dependencies to resolve CVEs and update deprecated modules
  • Align the internal API with KFP v2 
  • Update jupyter notebook docker images
  • Demo!

Elyra Note: This work is already in progress by my group at Red Hat, together with the Elyra community.

  • (Done) Upgrade dependencies to resolve CVEs and update deprecated modules on Jupyter 4.x
  • (Done) Fix Elyra 4.x build
  • (Done) Migrate Elyra extensions to support JupyterLab 4.2.5 
  • PR and part of Elyra 4
  • (WIP) Align Elyra 4.x with KFP v2  (PR)
  • https://github.com/elyra-ai/elyra/pull/3273 
  • As soon as Elyra releases 4.x, update Kubeflow docs to support the add-on https://www.kubeflow.org/docs/external-add-ons/elyra/introduction/ 
  • Integrate Elyra with Jupyter Notebook docker images on Kubeflow Notebooks.
  • Demo!

Jupyter Scheduler

  • Demonstrate the capability of Jupyter Scheduler extension for Notebook Workflows.
  • Discuss how we can consolidate efforts to build a unified solution for Notebook Workflows.

Phase 2 - Code Migration (XX Months)

Goal: code consolidated within the Kubeflow GitHub organization with proper code structure and naming

Phase 1 focused on establishing a baseline by demoing Kale and Elyra integrations successfully. In this phase we want to consolidate the Kale codebase under the Kubeflow organization. This new structure will allow us to work on top of Kale and iteratively build the new IDE experience for Kubeflow. Elyra will continue to be the interim solution for low-code visual pipeline authoring.

  • Migrating kubeflow-kale/kale to kubeflow/XXX - naming of the repository to be discussed with Kubeflow community. This new repository will house everything related to Kubeflow IDE plugins and extensions

Phase 3 - Enhance IDE extension  (XX Months)

Goal: Add the visual authoring and the runtime pipeline visualization to the Kale baseline. With these new features Kubeflow can provide both a notebook-based and a visual/drag-and-drop-based authoring pipeline experience. We are also planning to provide the same visualization look and feel both on IDE and on the Kubeflow Central Dashboard.

Long-term plan

Goal: Kubeflow JupyterLab Extension MVP will provide a streamlined user experience to data scientists and machine learning practitioners across all components of the Kubeflow ecosystem.

ederign avatar Jan 29 '25 16:01 ederign

CC @kubeflow/kubeflow-steering-committee @StefanoFioravanzo @andreyvelich

This proposal submission is a collaboration between @StefanoFioravanzo, @andreyvelich, and myself. We also got helpful feedback from multiple other community members.

ederign avatar Jan 29 '25 16:01 ederign

This proposal is also related to the 'SDK discussion' on https://github.com/kubeflow/training-operator/issues/2402#issuecomment-2619160006

ederign avatar Jan 29 '25 16:01 ederign

@ederign thanks for migrating our notes and creating the issue! Looking forward to starting these efforts and can't wait to hear feedback from the community

StefanoFioravanzo avatar Jan 29 '25 16:01 StefanoFioravanzo

cc @zsailer @bigsur0 @shravan-achar @akshaychitneni

andreyvelich avatar Jan 29 '25 16:01 andreyvelich

Thanks for the well-written proposal. Some of these align very well with the mission of the Elyra project. Given the synergy, it might be a good idea to explore how we could make some of these in the context of Jupyter/Elyra in particular as we are all projects related to the Linux Foundation. Please let me know if any specific meetings are happening in this area.

cc @caponetto @shalberd @romeokienzler

lresende avatar Jan 29 '25 20:01 lresende

@lresende absolutely! We still need to wait for broader feedback from the community about the proposal, but if we agree to proceed, I'll make sure to invite Elyra folks to the discussions.

ederign avatar Jan 29 '25 22:01 ederign

I think this is a great idea and will enhance the overall UX with Kubeflow! I'd be happy to help out with any of the initiatives.

Griffin-Sullivan avatar Jan 30 '25 14:01 Griffin-Sullivan

Really detailed proposal, thank you very much for that! From my experience at Pepsico, Data Scientists often struggle to get familiar with Kubeflow, and companies typically need to develop a tool or library to help them use it effectively. Once implemented, this could definitely accelerate adoption.

milosjava avatar Jan 31 '25 19:01 milosjava

I think it's really great initiative that will improve Kubeflow usability. And thank you so much for the detailed explanation, great work! I would really like to help in this initiative.

tarekabouzeid avatar Feb 04 '25 20:02 tarekabouzeid

Hi Folks, I propose a new name for this Working Group: ML Experience. Given that we will develop many tools (Jupyter Extensions, SDK, re-usable UI components) that streamline ML Engineer experience. What do community think on this ?

andreyvelich avatar Feb 14 '25 16:02 andreyvelich

@andreyvelich before focusing on the name itself - do you confirm you are ok with the charter and the proposed action plan? Don't want to get hung up on naming in case there are aspects of the proposal that need to be discussed.

If the proposal looks ok, then let's discussing naming

StefanoFioravanzo avatar Feb 15 '25 08:02 StefanoFioravanzo

Sure, that sounds good to me @StefanoFioravanzo! In any case, let's talk about it at the next Kubeflow Community Call and covert this proposal to the PR in kubeflow/community.

andreyvelich avatar Feb 17 '25 15:02 andreyvelich

Thank everyone for all the input here. I just submit a proposal for the kubeflow community: https://github.com/kubeflow/community/pull/824

ederign avatar Feb 18 '25 17:02 ederign

@ederign thank you for working on this proposal. I love the idea of user-centric approach basically when looking into how the different tools can make their journey easily by integrating or building new tools. I'm interested in joining.

varodrig avatar Feb 26 '25 03:02 varodrig

@varodrig great! I would love your feedback at https://github.com/kubeflow/community/pull/824

ederign avatar Feb 26 '25 13:02 ederign

@ederign , could you provide some initial guidance or key resources to help me gain a better understanding of the project?

RonakSingh55 avatar Mar 09 '25 13:03 RonakSingh55

@ederign I would like to join WG if possible please

szaher avatar Mar 11 '25 15:03 szaher

@RonakSingh55 @szaher, that is great! We are discussing the official proposal of the working group here: https://github.com/kubeflow/community/pull/824

ederign avatar Mar 17 '25 11:03 ederign

Let's keep it open until we finalize scope of ML Experience WG. /retitle Proposal: ML Experience Working Group

andreyvelich avatar Mar 17 '25 20:03 andreyvelich

I just raised a new PR with a FUP of requested changes on https://github.com/kubeflow/community/pull/824.

ederign avatar Mar 19 '25 15:03 ederign

Hi @ederign , @StefanoFioravanzo

I came across the opportunity to develop a JupyterLab Plugin for Kubeflow, and I’m highly interested in contributing to this project. With my experience in JavaScript, React, Python, and API integrations, I believe I can help create a seamless JupyterLab extension that integrates with Kubeflow Pipelines, Notebooks, Model Registry, and Training Operator.

I have experience in JupyterLab extensions, backend API development, and have worked on projects involving data processing, AI tools, and web applications. I am eager to modernize and consolidate existing solutions like Elyra, Kale, and Jupyter Scheduler into a unified plugin to enhance the Kubeflow ecosystem.

I would love to discuss how I can contribute effectively. Please let me know the next steps or if there’s any documentation I should review to get started.

Looking forward to your response!

Best regards, Abhishek kaul

Abhsihekkaul avatar Mar 25 '25 09:03 Abhsihekkaul

Hi @Abhsihekkaul! That is great, and I'm looking forward to collaborating with you! We are in the process of setting up a place for us to start gathering! As soon as I have the Slack channel, I'll let everybody here know!

ederign avatar Mar 25 '25 11:03 ederign

Ok @ederign

By the mean time shall i create a prototype of the implementation and craft my gsoc proposal and get a review from the team.

Abhsihekkaul avatar Mar 25 '25 11:03 Abhsihekkaul

Hi @ederign , @StefanoFioravanzo

I’m interested in contributing to the JupyterLab Plugin for Kubeflow and have started drafting my proposal for GSoC 2025 on this project.

Currently, I am pursuing my Master’s in Computer Science and am skilled in JavaScript, React, Python, and API integrations, with a strong focus on building scalable applications and intuitive user experiences.

What I have done to learn about Kubeflow:

  • [x] Completed reading Kubeflow documentation (Introduction, Architecture, and Components).
  • [x] Deployed a Kubernetes cluster locally using kind.
  • [x] Deployed Kubeflow using Kubeflow manifests.

Work in progress:

  • [x] Developing a sample Jupyter extension & learning about Jupyter widgets.
  • [x] Reviewing existing plugins/extensions like Elyra, Kale, and Jupyter Scheduler.

Looking forward to contribute to this project!

sudhathorat31 avatar Mar 27 '25 02:03 sudhathorat31

Hi @ederign, @StefanoFioravanzo, and everyone,

I’m Abdulrahman Omar, a data science student interested in improving my experience as an ML practitioner. I’ve reviewed the proposal in depth and explored the related technologies "Elyra, Kale, and Jupyter Scheduler" to understand how they currently interact within the Kubeflow ecosystem.

I'm excited about the vision of developing a unified JupyterLab plugin for Kubeflow. I would love to contribute to this effort, particularly in areas related to extension development and integration with Kubeflow components.

Please add me to the Slack channel or mailing list so I can stay in the loop and collaborate with the team.

Looking forward to working with you all!

Abdulrahmann-Omar avatar Apr 08 '25 15:04 Abdulrahmann-Omar

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jul 08 '25 00:07 github-actions[bot]

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

github-actions[bot] avatar Jul 29 '25 00:07 github-actions[bot]