fiftyone icon indicating copy to clipboard operation
fiftyone copied to clipboard

[FR] Add support for Kubeflow notebooks

Open AlexandreBrown opened this issue 2 years ago • 11 comments

Proposal Summary

I propose to add support for the open-source ML platform called Kubeflow.
Kubeflow is a pretty popular and open platform that covers the end-to-end ML workflow.
It is a platform that provides Notebooks & can be installed in the cloud (eg: AWS, Google Cloud etc) or on premise, it runs on Kubernetes so everywhere Kubernetes run.
When we run a kubeflow notebook, we can use Jupyter Lab or other IDEAs, me & my team use Jupyter Lab. It would be great if we could launch FiftyOne App from a Kubeflow Notebook

Motivation

  • More & more people are moving to the cloud as a solution to streamline the tools used across teams & team members.
  • Adding the support to Kubeflow also means FiftyOne would get more exposure from the big Kubeflow community
  • In my specific case supporting Kubeflow Notebooks is valuable since we work in Kubeflow Notebooks every day. If we have to go back to a localhost notebook then it would break our flow as we would need to jump around multiple environments (from cloud to local to cloud to local...) and it would not be possible depending on what we work on (Kubeflow Notebooks give us access to resources we don't have locally etc).
  • Currently it does not seem to be supported when trying to launch the app from a Kubeflow Notebook.

What areas of FiftyOne does this feature affect?

  • [x] App: FiftyOne application
  • [ ] Core: Core fiftyone Python library
  • [x] Server: FiftyOne server

Details

The idea is to have a similar experience that is present on Google Colab but for Kubeflow Notebooks.
In short, the ideal solution should allow to launch the FiftyOne app from a Kubeflow Notebook.

Approaches brainstorm

There are a lot of different approaches we can take.

  • Make changes internally so that "it just works" like it does for Google Colab.

  • Kubeflow supports adding third party applications so maybe there is a way to take advantage of that (see https://www.kubeflow.org/docs/components/central-dash/customizing-menu/ ).
    I am not Kubernetes expert but my intuition is that if we can expose the application (maybe via a VirtualService) then maybe this would be enough to add a link to it in the Kubeflow UI. Maybe we can have a small microservice-like app that is running FiftyOne app and exposing a port etc.
    Here is an example of how to integrate another application (here it was done for MLFlow) with Kubeflow https://medium.com/dkatalis/kubeflow-with-mlflow-702cf2ebf3bf

  • Another approach could be to have a way to generate a random/temporary secured URL that is publicly accessible making FiftyOne app work virtually anywhere (not just Kubeflow Notebooks).

There are other approaches as well that probably exist, I am not a Kubernetes expert.

Willingness to contribute

The FiftyOne Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • [ ] Yes. I can contribute this feature independently.
  • [x] Yes. I would be willing to contribute this feature with guidance from the FiftyOne community.
  • [ ] No. I cannot contribute this feature at this time.

AlexandreBrown avatar Jun 21 '22 18:06 AlexandreBrown

Thanks for the feature request! @benjaminpkane is a busy guy, but he's the lead developer on the FiftyOne App, so he'd be the best point of contact on this when he has some bandwidth.

We had to make a small tweak to make the App work in Google Colab, and I suspect a similar smallish tweak would be possible to support Kubeflow notebooks.

brimoor avatar Jun 23 '22 23:06 brimoor

Hello @benjaminpkane , just circling back here, let me know what you think about this feature request and its feasibility.
Thanks

AlexandreBrown avatar Jul 05 '22 17:07 AlexandreBrown

Thanks for the interest @AlexandreBrown. I will take a look this week.

benjaminpkane avatar Jul 05 '22 22:07 benjaminpkane

Hi @benjaminpkane, I'd love to contribute to this. Currently going through the source codes.

dataset (None): an optional :class:`fiftyone.core.dataset.Dataset` or
            :class:`fiftyone.core.view.DatasetView` to load
        view (None): an optional :class:`fiftyone.core.view.DatasetView` to
            load
        port (None): the port number to serve the App. If None,
            ``fiftyone.config.default_app_port`` is used
        address (None): the address to serve the App. If None,
            ``fiftyone.config.default_app_address`` is used
        remote (False): whether this is a remote session, and opening the App
            should not be attempted
        desktop (None): whether to launch the App in the browser (False) or as
            a desktop App (True). If None, ``fiftyone.config.desktop_app`` is
            used. Not applicable to notebook contexts
        height (None): an optional height, in pixels, at which to render App
            instances in notebook cells. Only applicable in notebook contexts
        auto (True): whether to automatically show a new App window
            whenever the state of the session is updated. Only applicable
            in notebook contexts
        config (None): an optional :class:`fiftyone.core.config.AppConfig` to
            control fine-grained default App settings

Does this mean if I am able to expose fiftyone's port to my kubernetes cluster, the session would launch?

josepholaide avatar Jul 08 '22 22:07 josepholaide

@josepholaide nice! Yes, the starting point here is how networking will work. It may be more than just the port. Let me look at this tomorrow with fresh eyes, and I will provide more details.

benjaminpkane avatar Jul 08 '22 23:07 benjaminpkane

Thank you, I will be expecting your feedback.

On Sat, 9 Jul 2022 at 00:00, Benjamin Kane @.***> wrote:

@josepholaide https://github.com/josepholaide nice! Yes, the starting point here is how networking will work. It may be more than just the port. Let me look at this tomorrow with fresh eyes, and I will provide more details.

— Reply to this email directly, view it on GitHub https://github.com/voxel51/fiftyone/issues/1901#issuecomment-1179417906, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARRAP63CQ5VSM7OCUFE2X7LVTCXJRANCNFSM5ZNKYNZA . You are receiving this because you were mentioned.Message ID: @.***>

josepholaide avatar Jul 09 '22 18:07 josepholaide

Ok, pardon the delay. I spent some time trying to set up kubeflow for my own curiosity, but I'm not finished with that so I'll leave an outline of what it means to support a notebook environment in general.

Python

Notebook environments, like any environment, are controlled by sessions. In notebooks, though, the session must know what URL to display in output cells. If the environment follows the IPython API, things are fairly straightforward.

Anyway, the important function here is fiftyone.core.session/notebooks.display(). The context, e.g. IPYTHON or COLAB is checked, and a proper URL is constructed that points to the session server.

Noting your first question, remote notebooks require exposing the FiftyOne session server in addition to the Jupyter server over the network, so a Kubeflow environment will likely require extra networking as well.

App

The other important part of the equation is making sure the App knows how to call the server if there is any non-standard networking. The two important functions here are getAPI() and setFetchFunction.

One other detail is that a memory history is currently used in notebook contexts instead of a browser history, which helps avoid path issues if the notebook runs through a routed proxy, e.g. databricks. See here

That's where to get started. Full support also requires screenshots, which involves replacing cells through the IPython display handle object (no need to worry about that now). If Kubeflow uses proper Jupyter notebooks, then it shouldn't be an issue. Let me know if you have any more questions!

benjaminpkane avatar Jul 11 '22 00:07 benjaminpkane

@benjaminpkane Regarding the Kubeflow setup, I can help you with that.
A good starting point to get up and running quickly on AWS is : https://awslabs.github.io/kubeflow-manifests/docs/deployment/vanilla/guide/
Maybe once someone comes up with a solution/prototype, it would also be interesting to see if the solution works with production deployment where the URL won't be localhost but rather from an actual domain (eg: Using load balancer/ Cognito for AWS https://awslabs.github.io/kubeflow-manifests/docs/deployment/cognito/guide-automated/ )

Let me know if you need more help with that, I had to go through the setups many times so I'm willing to help if needed, we can always chat on slack as well.

AlexandreBrown avatar Jul 11 '22 01:07 AlexandreBrown

Currently, working through the outlined steps @benjaminpkane

To setup Kubeflow in minutes, you can try the free trial version of Kubeflow as a service. It lasts 14 days. https://www.arrikto.com/kubeflow-as-a-service/

Also, I am using JupyterLab is that fine setting up fiftyone?

On Mon, 11 Jul 2022 at 02:40, Alexandre Brown @.***> wrote:

@benjaminpkane https://github.com/benjaminpkane Regarding the Kubeflow setup, I can help you with that, I will ping you on discord just in case you need more details. A good starting point to get up and running quickly on AWS is : https://awslabs.github.io/kubeflow-manifests/docs/deployment/vanilla/guide/

— Reply to this email directly, view it on GitHub https://github.com/voxel51/fiftyone/issues/1901#issuecomment-1179867317, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARRAP62RGDLTMZV2SCZGGTDVTN3P5ANCNFSM5ZNKYNZA . You are receiving this because you were mentioned.Message ID: @.***>

josepholaide avatar Jul 11 '22 06:07 josepholaide

@benjaminpkane following up on the Kubeflow deployment. I can assist with the kubeflow deployment setup.

On Mon, 11 Jul 2022 at 07:55, olaide joseph @.***> wrote:

Currently, working through the outlined steps @benjaminpkane

To setup Kubeflow in minutes, you can try the free trial version of Kubeflow as a service. It lasts 14 days. https://www.arrikto.com/kubeflow-as-a-service/

Also, I am using JupyterLab is that fine setting up fiftyone?

On Mon, 11 Jul 2022 at 02:40, Alexandre Brown @.***> wrote:

@benjaminpkane https://github.com/benjaminpkane Regarding the Kubeflow setup, I can help you with that, I will ping you on discord just in case you need more details. A good starting point to get up and running quickly on AWS is : https://awslabs.github.io/kubeflow-manifests/docs/deployment/vanilla/guide/

— Reply to this email directly, view it on GitHub https://github.com/voxel51/fiftyone/issues/1901#issuecomment-1179867317, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARRAP62RGDLTMZV2SCZGGTDVTN3P5ANCNFSM5ZNKYNZA . You are receiving this because you were mentioned.Message ID: @.***>

josepholaide avatar Jul 14 '22 18:07 josepholaide

JupyterLab is fine. I'm happy to to set up Kubeflow with Arrikto if/when there is something to test

benjaminpkane avatar Jul 14 '22 19:07 benjaminpkane