pipelines
pipelines copied to clipboard
support separate pipeline for each namespace
EDIT from @Bobgy: This issue got 47 upvotes when requesting user feedback: https://github.com/kubeflow/pipelines/issues/1223#issuecomment-656508302.
Proposed change
Currently with 1.0.2 version, Kubeflow Deployment with kfctl_k8s_istio shares pipelines for all namespaces defined in user profile. Separate pipeline support for each namespace is needed because the multi-user notebook server separation is already supported. It is natural to support pipeline separation.
Alternative options
NA
Who would use this feature?
A lot of enterprises will benefit from this feature as allocating different namespaces to different teams is a common practice in Kubernetes and resources in existing namespaces can be effectively used.
Suggest a solution
NA
FYI, in existing to be released KF 1.1, pipeline runs are already in user namespaces.
but the static pipeline yaml files are not separated. Do you want this additionally?
Related to https://github.com/kubeflow/pipelines/issues/1223
There are more clarification in https://github.com/kubeflow/pipelines/issues/1223#issuecomment-656507073.
If you agree with the proposal in https://github.com/kubeflow/pipelines/issues/1223#issuecomment-656508302, can you rephrase your description to just focus on pipeline resource for this issue.
I think the pipeline yaml files should be supported also for E2E separation. Thanks. Already upvote in #1223 .
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/frozen
Sorry, I'm kind of oversubscribed because of extra Kubeflow duties recently, so I may not be able to take this issue soon.
Leaving my previous thoughts about this:
For use-case context: some users want full separation of pipelines, and some want no separation. The distinction is mostly organization culture and I think both requests are valid.
So I think a MVP UX I can imagine that satisfies both is to:
- pipelines in DB should have a namespace column
- if namespace == '' (empty string), then it's in the shared space
- if namespace is not empty, it's in that namespace
- all requests to get pipelines can specify namespace='' to query only shared pipelines
- or specify namespace='XXX' to find pipelines in a namespace + shared pipelines
- We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines
- We need to add a KFP UI option to switch between uploading a pipeline to shared space or current namespace.
Benefits:
- The same backend logic seems to be able to support both 'all shared' or 'all separated' use cases.
- There's no DB migration needed, this will be a seamless transition from older versions of KFP.
Probably there are cases I haven't fully thought through. I'd suggest anyone who's willing to push this issue forward to do the following:
- try to discuss and converge on a set of CUJs (critical user journey) that works for everyone, note we should probably start with MVPs that address immediate concerns, but make sure we can head towards the direction that all valid use-cases can be supported.
- after consensus on the UX, discuss with maintainers a rough design of the implementation
- contribute PRs to implement this feature in smaller self-contained steps
/cc @yanniszark about this
and /cc @chensun who designed multi user backend separation for other KFP resources
@Bobgy @chensun
We need to add a KFP UI option to switch between uploading a pipeline to shared space or current namespace.
Do we want user to upload their pipeline to share space? Should they use Managed Contributor instead like we did for experiments and runs?
There's no DB migration needed, this will be a seamless transition from older versions of KFP.
Pipeline schema doesn't have namespace concept yet. seamless migration here means implicitly migration, right? for all previous pipeline, we can think they have all empty namespace
@Bobgy @chensun
We need to add a KFP UI option to switch between uploading a pipeline to shared space or current namespace.
Do we want user to upload their pipeline to share space? Should they use Managed Contributor instead like we did for experiments and runs?
Some customers I worked with prefer this shared space though. A problem with manage contributor is that, they want to take a pipeline from shared space and run it in their own namespace. This will not be possible with adding contributors.
(It might be a good idea, to allow specifying pipelines in other users' namespace and KFP backend checks for that permission like contributors. Is that what you are proposing?)
There's no DB migration needed, this will be a seamless transition from older versions of KFP.
Pipeline schema doesn't have namespace concept yet. seamless migration here means implicitly migration, right? for all previous pipeline, we can think they have all empty namespace
If we do not introduce the shared pipeline concept, then we either need to figure out which pipeline belongs to which namespace or make all past pipelines inaccessible after an upgrade. This is fairly tricky to deal with.
A problem with manage contributor is that, they want to take a pipeline from shared space and run it in their own namespace. This will not be possible with adding contributors.
Got it, they like to "folk" the pipeline into their own namespace. Managed contributor can only manage pipeline between users but not able to public to all. I think this makes sense.
to allow specifying pipelines in other users' namespace It might be hard to deal with pipeline discovery (current KFP doesn't have granular control). User may have to provide pipeline name in other user's namespace.
I think what you propose is a simple and better way to support isolated and shared pipelines.
then we either need to figure out which pipeline belongs to which namespace
I notice UI actually determine namespace from centralboard. https://github.com/kubeflow/pipelines/blob/935a9b5ba5057bc9801fee87ed17c03c2907ec85/frontend/src/lib/KubeflowClient.tsx#L23-L33.
Do you think it's better to figure out username and corresponding namespace via headers and KFAM? backend can parse user's email and check kfam to figure out which namespace this profile owns and the namespace other users share with the current user. Then KFP resources can be filtered by these namespaces.
With this we can decouple centraldashboard from KFP. I think multi-user KFP can be used separated with Central dashboard. (probably only need KFAM profile and istio)
Regarding namespace selector, I think the topic was brought up before too, if we plan to let KFP support multi user mode without central dashboard, we can build a namespace selector similar to the one in centraldashboard. It won't be so much effort on the UI side (it was designed to be very decoupled in UI code from the beginning).
Regarding to KFAM, we have decided to deprecate it, @elikatsis has sent out a PR a few days ago: https://github.com/kubeflow/pipelines/pull/4723. Let's move related discussion to that PR, and keep discussion here focused on separating pipeline resources.
@Bobgy
- pipelines in DB should have a namespace column
Shall we have Namespace in pipeline_versions schema? For example I see Description is in pipelines but not in pipeline_versions. Not sure what is the criteria to decide here, but seems users are not allowed to edit or change pipelines once loaded, right?
- We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines
A config option shall allow organizations to enable/disable the possibility pipelines can be public. Where do you think this config can be set (DB, configMap, config.json)?
@Bobgy
- pipelines in DB should have a namespace column
Shall we have Namespace in pipeline_versions schema? For example I see Description is in pipelines but not in pipeline_versions. Not sure what is the criteria to decide here, but seems users are not allowed to edit or change pipelines once loaded, right?
Versions are sub resources of pipelines, so they should be in the same namespace of the pipeline.
Description was more of an unfinished migration to version API, ideally we should have description field on versions too.
- We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines
A config option shall allow organizations to enable/disable the possibility pipelines can be public. Where do you think this config can be set (DB, configMap, config.json)?
You should use viper to get this config like other configs in API server, so that it will be configurable via config map, config.json or env var directly.
@Jeffwan @Bobgy
We completed the implementation using KFam (for now). From a UI standpoint, we only added a checkbox to indicate if the Pipeline is "Shared" or not, Pipeline Versions of course cannot be shared. So if you select "create new pipeline..." this checkmark will not be there.
Most of the changes were in the Backend. You can see the implementation details on this commit: https://github.com/arllanos/pipelines/commit/2c88722f6f05b67acc16c2b4e7bc54ff91c3f36c
We have not implemented an Env Variable to disable "Pipeline Sharing", however this shouldn't be too bad I think it should suffice to disable the button all together and Pipelines will be namespaced by default.
Next week I'll switch the authentication to SubjectAccessReview and make the PR.
One thing to note is that, "Shared Pipelines" are still deletable by other users in the System. I'm not sure what the ideal scenario should be here.
@Bobdy @maganaluis thanks for your efforts on this. Some comments:
or specify namespace='XXX' to find pipelines in a namespace + shared pipelines
@Bobgy I think it would be best to not return mixed results here. Specifying namespace='XXX' should only return pipelines from namespace 'XXX'.
We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines
@Bobgy let me propose something different. Instead of turning off features, why not protect them with authorization? For example, we can have two kinds (in RBAC only, not in the MySQL DB), Pipelines
and ClusterPipelines
(namespace=="", name inspired from Role and ClusterRole). If an organization doesn't want to use one or the other, they simply need to not give their users permissions for one or the other.
@bobdy @maganaluis thanks for your efforts on this. Some comments:
or specify namespace='XXX' to find pipelines in a namespace + shared pipelines
@Bobgy I think it would be best to not return mixed results here. Specifying namespace='XXX' should only return pipelines from namespace 'XXX'.
You are right. Agree with that. So there needs to be some UI change to allow finding pipelines from either shared or namespaced ones.
We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines
@Bobgy let me propose something different. Instead of turning off features, why not protect them with authorization? For example, we can have two kinds (in RBAC only, not in the MySQL DB),
Pipelines
andClusterPipelines
(namespace=="", name inspired from Role and ClusterRole). If an organization doesn't want to use one or the other, they simply need to not give their users permissions for one or the other.
I guess it depends more on whether the shared pipelines will be a long term UX we will maintain. If it is, then I agree with your suggestion, that sounds like a clean solution. Do we have enough evidence it will be?
I think there're still quite some questions we need a clear answer on. @maganaluis will you be willing to draft a rough design doc for this issue
Hmm, maybe we can start with your PR description for discussion and see if everyone can agree on that.
Note that, for above discussion, my personal opinion is that, the MVP PR do not need to implement any of the ClusterPipeline
rbac stuff nor the "pipeline sharing" switch. We can start from the minimal and enhance based on further requests.
One thing to note is that, "Shared Pipelines" are still deletable by other users in the System. I'm not sure what the ideal scenario should be here.
It's backward compatible behavior, so I don't think we need to worry about that right now.
We still need UI work to finish this
Remaining work item tracked in https://github.com/kubeflow/pipelines/issues/5084
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/lifecycle frozen
To summarize on the decisions in the backend implementation when supporting namespaced pipelines: https://github.com/kubeflow/pipelines/pull/4835.
- We now have shared pipelines and namespaced pipelines.
- Past KFP pipeline resources are now shared pipelines, they do not belong to a namespace and all users can access them. It's TBD for implementing access control for shared pipelines, if we do in the future, we may use cluster-wide RBAC permissions.
- Namespaced pipelines belong to a specific namespace, they are access controlled using namespaced RBAC roles/rolebindings like other KFP resources.
- In the DB schema, both of them are pipeline resources, the difference is just whether the namespace field is empty or not.
- All backend API behaviors are backward compatible
- when uploading a pipeline, without namespace -> shared pipeline; with namespace -> namespaced pipeline
- when listing pipelines, without namespace -> returns shared pipeline only; with namespace -> namespaced pipelines only
- To be implemented, for KFP UI integration
- on pipeline list page, there should be a new tab or checkbox that decides whether the user is looking at shared or namespaced pipelines
- on create pipeline page, there should be a new checkbox that decides whether the uploaded pipeline should be shared or namespaced
references:
- https://github.com/kubeflow/pipelines/pull/4835#issuecomment-765437001
- https://github.com/kubeflow/pipelines/pull/4835#issuecomment-770495379
Does the SDK support uploading namespaced pipelines? It does not seem to be included in the docs https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.client.html
For anyone coming here and wanting a tl/dr on the issue search (some of this is a repeat of @Bobgy's great summary above):
- Early implementation tracking issue: Includes a design doc from 2019
- Isolation of pipeline definitions (this issue): Not fully implemented, but see summaries above. Backend appears done (including updates to the sql DB, etc)
- Isolation of pipeline definitions (backend): Implemented backend changes to support isolation of pipelines
- Isolation of pipeline definitions (frontend): Basics spec'd out, but no work completes. Arrikto expressed interest in doing it ~1 year ago but haven't had time (last response a few months ago).
- Isolation of pipeline definitions (sdk): As noted above the SDK needs at least some edits to enable this. Not sure if it just needs python changes or something deeper in the golang
/cc @zijianjoy
Re SDK changes, it's just python changes, golang clients are auto generated before each release.
Oh great! so something like the api behind this call does have the ability to accept namespace once things are regenerated?
Basically for the front end is it a matter of transplanting code like this:
https://github.com/kubeflow/pipelines/blob/0c0e4b61050e6fea6f20dd35e14eb3edc214bbba/frontend/src/pages/ExperimentList.tsx#L203-L204
and this:
https://github.com/kubeflow/pipelines/blob/0c0e4b61050e6fea6f20dd35e14eb3edc214bbba/frontend/src/pages/ExperimentList.tsx#L291-L294
Into the pipeline screens: https://github.com/kubeflow/pipelines/blob/0c0e4b61050e6fea6f20dd35e14eb3edc214bbba/frontend/src/pages/PipelineList.tsx#L178
?