model-registry icon indicating copy to clipboard operation
model-registry copied to clipboard

py: add basic service URL resolver

Open antisaling opened this issue 1 year ago • 14 comments

Description

Adds a simple constructor that resolves cluster service URLs.

How Has This Been Tested?

Tested with the latest UI using a MR deployed directly through the settings panel.

The client then connects:

from model_registry import ModelRegistry

mr = ModelRegistry.from_service("modelregistry-sample", "isinyaaa")

and creates a new MV:

mr.register_model("test", "s3://catopia/meow.onnx", model_format_name="oooo", model_format_version="v1", version="123")

❗ IMPORTANT: to make this work with ODH deployments you need to create a clusterwide rolebinding for the current DSP service accounts, with the odh-dashboard role.

Change was reflected on the UI as expected:

image

Merge criteria:

  • All the commits have been signed-off (To pass the DCO check)
  • [ ] The commits have meaningful messages; the author will squash them after approval or in case of manual merges will ask to merge with squash.
  • [ ] Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • [ ] The developer has manually tested the changes and verified that the changes work.
  • [ ] Code changes follow the kubeflow contribution guidelines.

If you have UI changes

  • [ ] The developer has added tests or explained why testing cannot be added.
  • [ ] Included any necessary screenshots or gifs if it was a UI change.
  • [ ] Verify that UI/UX changes conform the UX guidelines for Kubeflow.

antisaling avatar Sep 23 '24 19:09 antisaling

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign tomcli for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar Sep 23 '24 19:09 google-oss-prow[bot]

cc: @dhirajsb @tarilabs @Al-Pragliola

antisaling avatar Sep 23 '24 20:09 antisaling

@Al-Pragliola any thoughts on potential amendments which would not require a rolebinding or anyway any changes to the permission settings?

tarilabs avatar Sep 24 '24 06:09 tarilabs

@Al-Pragliola any thoughts on potential amendments which would not require a rolebinding or anyway any changes to the permission settings?

mm no, I can't think of another way to let the notebook pod interact with the k8s api-server, but I think the odh-dashboard role is a bit too permissive, we should have an ad-hoc role with only the required permissions

Al-Pragliola avatar Sep 24 '24 14:09 Al-Pragliola

Thanks @Al-Pragliola , adding to https://github.com/kubeflow/model-registry/pull/421#issuecomment-2371425492, what about having some "fallbacks" in case that permission is not there? Do we believe we might have some options?

For example, attemping in case a list K8s resources in a set of known to be used namespaces, may require simpler/lower permissions? for example, listing MR CR instances in a set of known namespaces in case this require lower permission, or listing Services again in known namespaces, etc?

tarilabs avatar Sep 24 '24 15:09 tarilabs

@tarilabs if we can't get DSC it falls back to looking for the instance in the specified namespace, or the default (kubeflow for upstream, should be changed to odh-model-registries when merging midstream)

antisaling avatar Sep 24 '24 16:09 antisaling

should be changed to odh-model-registries when merging midstream

I don't think that is viable, since there will be only 1 "upstream" pypi of the MR python client?

Hence why I believe would have been preferable to adopt a fallback(s) approach

tarilabs avatar Sep 24 '24 17:09 tarilabs

IMO the Python library should not use any OpenShift calls directly. That brings in additional dependency on where the MR deployed into a client library. I suggest we feed this information through some kind of config or env to keep it simple, then think through how this config can be supplied. I am thinking like .kubeconfig wdyt?

rareddy avatar Sep 26 '24 13:09 rareddy

Yes, the DSC namespace discovery is an odh/rhoai feature, since there is no equivalent place to discover registries namespace in kubeflow. Although, the default MR manifests kind of make kubeflow namespace the default registry namespace.

We should keep things simple and allow this capability to discover the presence of DSC CR in the client library. The Python client can simply ignore that service discovery path if DSC resource type is not present and try to lookup registries using the hostname provided by users.

If we want to allow Python API to be service location agnostic, another way to do that would be by supporting an env variable for a default MR namespace, and also support specifying a direct service host. That way it can be also be injected into the notebook from a user supplied configmap (which could be cluster/env specific) if needed.

dhirajsb avatar Sep 30 '24 19:09 dhirajsb

BTW, the Python client can do this without binding to OpenShift specific APIs using the K8s unstructured generic API.

dhirajsb avatar Sep 30 '24 19:09 dhirajsb

The odh-dashboard role referenced by @isinyaaa is a workaround, not a requirement once Notebooks grants the right permissions midstream. That's being worked on by the odh notebooks team.

dhirajsb avatar Sep 30 '24 19:09 dhirajsb

Apologies, when I mentioned OpenShift in general I was referring to Kube. saying in general Python client should not make any assumptions on where the MR is hosted IMO. Thinking more in terms of what you said above

If we want to allow Python API to be service location agnostic, another way to do that would be by supporting an env variable for a default MR namespace, and also support specifying a direct service host. That way it can be also be injected into the notebook from a user supplied configmap (which could be cluster/env specific) if needed.

then make some other automation happen to provide the config/env based on environment.

rareddy avatar Oct 01 '24 03:10 rareddy

similar pattern by MLFlow https://mlflow.org/docs/latest/model-registry.html#id10

rareddy avatar Oct 01 '24 18:10 rareddy

similar pattern by MLFlow https://mlflow.org/docs/latest/model-registry.html#id10

I believe this is the link you wanted: https://mlflow.org/docs/latest/model-registry.html#databricks-unity-catalog-model-registry (the original link is an entry in a TOC)

Sounds nice and following up from https://github.com/kubeflow/model-registry/pull/447#discussion_r1787293244 , what about using something analogous to Viper on the Go side, like https://pypi.org/project/12factor-configclasses/ that would support said injection?

This way:

  • for upstream KF, we can provide them in the Manifest for example
  • for midstream, folks can rely on their own Operators making the env or CM

tarilabs avatar Oct 04 '24 12:10 tarilabs

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Feb 10 '25 04:02 github-actions[bot]

This pull request has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

github-actions[bot] avatar Mar 03 '25 04:03 github-actions[bot]

For the records, the overall direction is to rely on configuration of the environment for any UX improvements in connecting the client, for example by means of Env variables.

This will ensure the design and architecture of the application to follow principles from https://12factor.net and similar.

A practical example can be seen in https://github.com/kubeflow/model-registry/pull/765 where the configuration of the S3 bucket is expected to be available from Env variables (but can always be supplied programmatically.

In pragmatic terms this means eventually we should amend the MR py client constructor to adopt similar approach if the connection URL is not provided.

Clearly, to exercise this option will require for all KF Component controllers to inject the desired "model registry" values as Env variables, a mechanism that might not be generally implemented in KF.

tarilabs avatar Mar 03 '25 07:03 tarilabs