give Model two tier naming hierarchy
Hi Folks,
Is your feature request related to a problem? Please describe.
Popular registries store entities using a two tier naming hierarchy. Examples include:
- Github: '<org_name>/<repo_name>' i.e. github.com/kubeflow/model-registry
- Docker hub: '<org_name>/<container_name>' i.e. hub.docker.com/r/apache/airflow
- Weights and Biases: '<project_name>/<artifact_name>'
- Maven (Java ecosystem in general): '<domain_name>/<project_name>' i.e. mvnrepository.com/artifact/org.apache.logging.log4j/log4j-core
- Artifcatory (inherets from popular registries.
Today, the Model appears to only support Name.
Describe the solution you'd like
I suggest that an organization field is added to the Model. I think that might be introducing a new field here: https://github.com/kubeflow/model-registry/blob/a1b414da9e8e07a38f5aa8517925023ffca3dba1/internal/db/models/model_artifact.go#L20
Describe alternatives you've considered
I have considered:
- Doing nothing: There are examples of repositories where the artifacts just have a name: pypi, linux distros. Using just a name increases the likelihood of name collision. With a single name it will be difficult to exchange models from this Registry into Registries and tooling that require two tier naming.
- Make two tier optional: It looks like Npm has optional two tier. If a second tier name is added to the Model and required users could choose to ignore by providing a default. This is a simple workaround and avoids adding logic to make it truly optional.
Additional context I'm very excited to see Kubeflow develop a registry solution!
thank you @richardhenwood for reaching out!
For completeness, Docker hub and OCI container specification do not really make a distinction of <org_name>/ and <org_name>/<container_name> by the spec.
The repository doesn't necessarily follow a hierarchy within the registry, meaning a/b/c and a/b/c/d are both valid and distinct repositories, with no implied hierarchy between them.
from: Oras webiste
You can also find it specified here as <name> and there is no concept of Organization.
Truly however, by convention all OCI Registry including also Docker Hub take the first part before the first / as an Organization/Group, and for retro-compatibility library is the default when just specifying busybox as an Image reference.
To your point, there is some discussion about Namespace here: https://github.com/kubeflow/community/pull/892
That clarified, the Tenancy model you are seeking can be achieved by deploying another dedicated instance of Model Registry where you give it access to your required groups, which is the design guide we did implement for Model Registry.
Hi @tarilabs ,
Thanks for your thoughtful response and background. It seems Docker might be uncharitably described as 'you can put a slash in the name to get a hierarchy' and not a 'real' two tier naming hierarchy. I didn't really appreciate this so it is interesting to know. This leads to a third option to get 'two tier naming hierarchy' with no changes on the Model definition: just use slashes in the name string.
I think this then comes down to: is there a significant benefit adjusting models to add a 'real' two tier naming hierarchy? For me it feels like Github have a real model of Org/Repo and this enables some really nice use cases i.e. enterprise auth, fine grained access control, and search scope. However, it may be that designing for Github scale is (quite sensibly) out of scope and 'use-a-slash-in-the-name hierarchy' is Good Enough.
In my usage, I can anticipate this registry being made available as part of a Federated Registry system. We have artifacts across multiple registries, and Kubeflow Model-Registry would be another one. Today, our existing artifact registries predominantly use Org/Name pattern and I could achieve naming consistency using '/' in the name.
I didn't really appreciate this so it is interesting to know.
I didn't know either initially about docker registry/OCI registry so I'm sharing every chance I can because it's an interesting thing to know 🙃
our existing artifact registries predominantly use Org/Name pattern and I could achieve naming consistency using '/' in the name
the shortcoming of this approach which I do not recommend is that you can't achieve RBAC based on the naming convention.
Instead, you can achieve proper K8s RBAC by having a MR instance per Tenant.
In the future, the Namespace based organization being currently under discussion here: https://github.com/kubeflow/community/pull/892
I'm not sure that it applies, but we had a similar problem for the catalog API. We intend to consume sources that have an <org>/<model> convention (e.g. Hugging Face) and some that don't (e.g. YAML files from a single org), so we opted to allow a single / in the model names, but not require it. So far, that's been enough for the catalog.
Again, I'm not sure this applies. If the goal is only to be consistent with some other systems, then a / in the name is probably enough. But if the goal is multi-tenancy (to be specific, my understanding is different groups of people using the same model registry and sharing some, but not all, models between them), then that's something else entirely.
If we do this, however it's implemented, I'd like to ask that the organization be optional. I know there are multi-tenant use-cases popping up, but there are also still plenty of single-tenant cases.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.