fiftyone
fiftyone copied to clipboard
[FR] Add support for Kubeflow notebooks
Proposal Summary
I propose to add support for the open-source ML platform called Kubeflow.
Kubeflow is a pretty popular and open platform that covers the end-to-end ML workflow.
It is a platform that provides Notebooks & can be installed in the cloud (eg: AWS, Google Cloud etc) or on premise, it runs on Kubernetes so everywhere Kubernetes run.
When we run a kubeflow notebook, we can use Jupyter Lab or other IDEAs, me & my team use Jupyter Lab.
It would be great if we could launch FiftyOne App from a Kubeflow Notebook
Motivation
- More & more people are moving to the cloud as a solution to streamline the tools used across teams & team members.
- Adding the support to Kubeflow also means FiftyOne would get more exposure from the big Kubeflow community
- In my specific case supporting Kubeflow Notebooks is valuable since we work in Kubeflow Notebooks every day. If we have to go back to a localhost notebook then it would break our flow as we would need to jump around multiple environments (from cloud to local to cloud to local...) and it would not be possible depending on what we work on (Kubeflow Notebooks give us access to resources we don't have locally etc).
- Currently it does not seem to be supported when trying to launch the app from a Kubeflow Notebook.
What areas of FiftyOne does this feature affect?
- [x] App: FiftyOne application
- [ ] Core: Core
fiftyone
Python library - [x] Server: FiftyOne server
Details
The idea is to have a similar experience that is present on Google Colab but for Kubeflow Notebooks.
In short, the ideal solution should allow to launch the FiftyOne app from a Kubeflow Notebook.
Approaches brainstorm
There are a lot of different approaches we can take.
-
Make changes internally so that "it just works" like it does for Google Colab.
-
Kubeflow supports adding third party applications so maybe there is a way to take advantage of that (see https://www.kubeflow.org/docs/components/central-dash/customizing-menu/ ).
I am not Kubernetes expert but my intuition is that if we can expose the application (maybe via a VirtualService) then maybe this would be enough to add a link to it in the Kubeflow UI. Maybe we can have a small microservice-like app that is running FiftyOne app and exposing a port etc.
Here is an example of how to integrate another application (here it was done for MLFlow) with Kubeflow https://medium.com/dkatalis/kubeflow-with-mlflow-702cf2ebf3bf -
Another approach could be to have a way to generate a random/temporary secured URL that is publicly accessible making FiftyOne app work virtually anywhere (not just Kubeflow Notebooks).
There are other approaches as well that probably exist, I am not a Kubernetes expert.
Willingness to contribute
The FiftyOne Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?
- [ ] Yes. I can contribute this feature independently.
- [x] Yes. I would be willing to contribute this feature with guidance from the FiftyOne community.
- [ ] No. I cannot contribute this feature at this time.
Thanks for the feature request! @benjaminpkane is a busy guy, but he's the lead developer on the FiftyOne App, so he'd be the best point of contact on this when he has some bandwidth.
We had to make a small tweak to make the App work in Google Colab, and I suspect a similar smallish tweak would be possible to support Kubeflow notebooks.
Hello @benjaminpkane , just circling back here, let me know what you think about this feature request and its feasibility.
Thanks
Thanks for the interest @AlexandreBrown. I will take a look this week.
Hi @benjaminpkane, I'd love to contribute to this. Currently going through the source codes.
dataset (None): an optional :class:`fiftyone.core.dataset.Dataset` or
:class:`fiftyone.core.view.DatasetView` to load
view (None): an optional :class:`fiftyone.core.view.DatasetView` to
load
port (None): the port number to serve the App. If None,
``fiftyone.config.default_app_port`` is used
address (None): the address to serve the App. If None,
``fiftyone.config.default_app_address`` is used
remote (False): whether this is a remote session, and opening the App
should not be attempted
desktop (None): whether to launch the App in the browser (False) or as
a desktop App (True). If None, ``fiftyone.config.desktop_app`` is
used. Not applicable to notebook contexts
height (None): an optional height, in pixels, at which to render App
instances in notebook cells. Only applicable in notebook contexts
auto (True): whether to automatically show a new App window
whenever the state of the session is updated. Only applicable
in notebook contexts
config (None): an optional :class:`fiftyone.core.config.AppConfig` to
control fine-grained default App settings
Does this mean if I am able to expose fiftyone's port to my kubernetes cluster, the session would launch?
@josepholaide nice! Yes, the starting point here is how networking will work. It may be more than just the port. Let me look at this tomorrow with fresh eyes, and I will provide more details.
Thank you, I will be expecting your feedback.
On Sat, 9 Jul 2022 at 00:00, Benjamin Kane @.***> wrote:
@josepholaide https://github.com/josepholaide nice! Yes, the starting point here is how networking will work. It may be more than just the port. Let me look at this tomorrow with fresh eyes, and I will provide more details.
— Reply to this email directly, view it on GitHub https://github.com/voxel51/fiftyone/issues/1901#issuecomment-1179417906, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARRAP63CQ5VSM7OCUFE2X7LVTCXJRANCNFSM5ZNKYNZA . You are receiving this because you were mentioned.Message ID: @.***>
Ok, pardon the delay. I spent some time trying to set up kubeflow for my own curiosity, but I'm not finished with that so I'll leave an outline of what it means to support a notebook environment in general.
Python
Notebook environments, like any environment, are controlled by sessions. In notebooks, though, the session must know what URL to display in output cells. If the environment follows the IPython
API, things are fairly straightforward.
Anyway, the important function here is fiftyone.core.session/notebooks.display(). The context, e.g. IPYTHON
or COLAB
is checked, and a proper URL is constructed that points to the session server.
Noting your first question, remote notebooks require exposing the FiftyOne session server in addition to the Jupyter server over the network, so a Kubeflow environment will likely require extra networking as well.
App
The other important part of the equation is making sure the App knows how to call the server if there is any non-standard networking. The two important functions here are getAPI() and setFetchFunction.
One other detail is that a memory history is currently used in notebook contexts instead of a browser history, which helps avoid path issues if the notebook runs through a routed proxy, e.g. databricks. See here
That's where to get started. Full support also requires screenshots, which involves replacing cells through the IPython display handle object (no need to worry about that now). If Kubeflow uses proper Jupyter notebooks, then it shouldn't be an issue. Let me know if you have any more questions!
@benjaminpkane Regarding the Kubeflow setup, I can help you with that.
A good starting point to get up and running quickly on AWS is : https://awslabs.github.io/kubeflow-manifests/docs/deployment/vanilla/guide/
Maybe once someone comes up with a solution/prototype, it would also be interesting to see if the solution works with production deployment where the URL won't be localhost
but rather from an actual domain (eg: Using load balancer/ Cognito
for AWS https://awslabs.github.io/kubeflow-manifests/docs/deployment/cognito/guide-automated/ )
Let me know if you need more help with that, I had to go through the setups many times so I'm willing to help if needed, we can always chat on slack as well.
Currently, working through the outlined steps @benjaminpkane
To setup Kubeflow in minutes, you can try the free trial version of Kubeflow as a service. It lasts 14 days. https://www.arrikto.com/kubeflow-as-a-service/
Also, I am using JupyterLab is that fine setting up fiftyone?
On Mon, 11 Jul 2022 at 02:40, Alexandre Brown @.***> wrote:
@benjaminpkane https://github.com/benjaminpkane Regarding the Kubeflow setup, I can help you with that, I will ping you on discord just in case you need more details. A good starting point to get up and running quickly on AWS is : https://awslabs.github.io/kubeflow-manifests/docs/deployment/vanilla/guide/
— Reply to this email directly, view it on GitHub https://github.com/voxel51/fiftyone/issues/1901#issuecomment-1179867317, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARRAP62RGDLTMZV2SCZGGTDVTN3P5ANCNFSM5ZNKYNZA . You are receiving this because you were mentioned.Message ID: @.***>
@benjaminpkane following up on the Kubeflow deployment. I can assist with the kubeflow deployment setup.
On Mon, 11 Jul 2022 at 07:55, olaide joseph @.***> wrote:
Currently, working through the outlined steps @benjaminpkane
To setup Kubeflow in minutes, you can try the free trial version of Kubeflow as a service. It lasts 14 days. https://www.arrikto.com/kubeflow-as-a-service/
Also, I am using JupyterLab is that fine setting up fiftyone?
On Mon, 11 Jul 2022 at 02:40, Alexandre Brown @.***> wrote:
@benjaminpkane https://github.com/benjaminpkane Regarding the Kubeflow setup, I can help you with that, I will ping you on discord just in case you need more details. A good starting point to get up and running quickly on AWS is : https://awslabs.github.io/kubeflow-manifests/docs/deployment/vanilla/guide/
— Reply to this email directly, view it on GitHub https://github.com/voxel51/fiftyone/issues/1901#issuecomment-1179867317, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARRAP62RGDLTMZV2SCZGGTDVTN3P5ANCNFSM5ZNKYNZA . You are receiving this because you were mentioned.Message ID: @.***>
JupyterLab is fine. I'm happy to to set up Kubeflow with Arrikto if/when there is something to test