pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

support separate pipeline for each namespace

Open yaliqin opened this issue 4 years ago • 59 comments

EDIT from @Bobgy: This issue got 47 upvotes when requesting user feedback: https://github.com/kubeflow/pipelines/issues/1223#issuecomment-656508302.

Proposed change

Currently with 1.0.2 version, Kubeflow Deployment with kfctl_k8s_istio shares pipelines for all namespaces defined in user profile. Separate pipeline support for each namespace is needed because the multi-user notebook server separation is already supported. It is natural to support pipeline separation.

Alternative options

NA

Who would use this feature?

A lot of enterprises will benefit from this feature as allocating different namespaces to different teams is a common practice in Kubernetes and resources in existing namespaces can be effectively used.

Suggest a solution

NA

yaliqin avatar Jul 10 '20 02:07 yaliqin

FYI, in existing to be released KF 1.1, pipeline runs are already in user namespaces.

but the static pipeline yaml files are not separated. Do you want this additionally?

Bobgy avatar Jul 10 '20 03:07 Bobgy

Related to https://github.com/kubeflow/pipelines/issues/1223

Bobgy avatar Jul 10 '20 06:07 Bobgy

There are more clarification in https://github.com/kubeflow/pipelines/issues/1223#issuecomment-656507073.

If you agree with the proposal in https://github.com/kubeflow/pipelines/issues/1223#issuecomment-656508302, can you rephrase your description to just focus on pipeline resource for this issue.

Bobgy avatar Jul 10 '20 06:07 Bobgy

I think the pipeline yaml files should be supported also for E2E separation. Thanks. Already upvote in #1223 .

yaliqin avatar Jul 10 '20 17:07 yaliqin

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 10 '20 03:10 stale[bot]

/frozen

Bobgy avatar Oct 12 '20 10:10 Bobgy

Sorry, I'm kind of oversubscribed because of extra Kubeflow duties recently, so I may not be able to take this issue soon.

Leaving my previous thoughts about this:

For use-case context: some users want full separation of pipelines, and some want no separation. The distinction is mostly organization culture and I think both requests are valid.

So I think a MVP UX I can imagine that satisfies both is to:

  1. pipelines in DB should have a namespace column
    • if namespace == '' (empty string), then it's in the shared space
    • if namespace is not empty, it's in that namespace
  2. all requests to get pipelines can specify namespace='' to query only shared pipelines
  3. or specify namespace='XXX' to find pipelines in a namespace + shared pipelines
  4. We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines
  5. We need to add a KFP UI option to switch between uploading a pipeline to shared space or current namespace.

Benefits:

  1. The same backend logic seems to be able to support both 'all shared' or 'all separated' use cases.
  2. There's no DB migration needed, this will be a seamless transition from older versions of KFP.

Probably there are cases I haven't fully thought through. I'd suggest anyone who's willing to push this issue forward to do the following:

  1. try to discuss and converge on a set of CUJs (critical user journey) that works for everyone, note we should probably start with MVPs that address immediate concerns, but make sure we can head towards the direction that all valid use-cases can be supported.
  2. after consensus on the UX, discuss with maintainers a rough design of the implementation
  3. contribute PRs to implement this feature in smaller self-contained steps

Bobgy avatar Nov 04 '20 07:11 Bobgy

/cc @yanniszark about this

Bobgy avatar Nov 04 '20 07:11 Bobgy

and /cc @chensun who designed multi user backend separation for other KFP resources

Bobgy avatar Nov 04 '20 07:11 Bobgy

@Bobgy @chensun

We need to add a KFP UI option to switch between uploading a pipeline to shared space or current namespace.

Do we want user to upload their pipeline to share space? Should they use Managed Contributor instead like we did for experiments and runs?

There's no DB migration needed, this will be a seamless transition from older versions of KFP.

Pipeline schema doesn't have namespace concept yet. seamless migration here means implicitly migration, right? for all previous pipeline, we can think they have all empty namespace

Jeffwan avatar Nov 05 '20 06:11 Jeffwan

@Bobgy @chensun

We need to add a KFP UI option to switch between uploading a pipeline to shared space or current namespace.

Do we want user to upload their pipeline to share space? Should they use Managed Contributor instead like we did for experiments and runs?

Some customers I worked with prefer this shared space though. A problem with manage contributor is that, they want to take a pipeline from shared space and run it in their own namespace. This will not be possible with adding contributors.

(It might be a good idea, to allow specifying pipelines in other users' namespace and KFP backend checks for that permission like contributors. Is that what you are proposing?)

There's no DB migration needed, this will be a seamless transition from older versions of KFP.

Pipeline schema doesn't have namespace concept yet. seamless migration here means implicitly migration, right? for all previous pipeline, we can think they have all empty namespace

If we do not introduce the shared pipeline concept, then we either need to figure out which pipeline belongs to which namespace or make all past pipelines inaccessible after an upgrade. This is fairly tricky to deal with.

Bobgy avatar Nov 05 '20 07:11 Bobgy

A problem with manage contributor is that, they want to take a pipeline from shared space and run it in their own namespace. This will not be possible with adding contributors.

Got it, they like to "folk" the pipeline into their own namespace. Managed contributor can only manage pipeline between users but not able to public to all. I think this makes sense.

to allow specifying pipelines in other users' namespace It might be hard to deal with pipeline discovery (current KFP doesn't have granular control). User may have to provide pipeline name in other user's namespace.

I think what you propose is a simple and better way to support isolated and shared pipelines.

then we either need to figure out which pipeline belongs to which namespace

I notice UI actually determine namespace from centralboard. https://github.com/kubeflow/pipelines/blob/935a9b5ba5057bc9801fee87ed17c03c2907ec85/frontend/src/lib/KubeflowClient.tsx#L23-L33.

Do you think it's better to figure out username and corresponding namespace via headers and KFAM? backend can parse user's email and check kfam to figure out which namespace this profile owns and the namespace other users share with the current user. Then KFP resources can be filtered by these namespaces.

With this we can decouple centraldashboard from KFP. I think multi-user KFP can be used separated with Central dashboard. (probably only need KFAM profile and istio)

Jeffwan avatar Nov 05 '20 07:11 Jeffwan

Regarding namespace selector, I think the topic was brought up before too, if we plan to let KFP support multi user mode without central dashboard, we can build a namespace selector similar to the one in centraldashboard. It won't be so much effort on the UI side (it was designed to be very decoupled in UI code from the beginning).

Regarding to KFAM, we have decided to deprecate it, @elikatsis has sent out a PR a few days ago: https://github.com/kubeflow/pipelines/pull/4723. Let's move related discussion to that PR, and keep discussion here focused on separating pipeline resources.

Bobgy avatar Nov 06 '20 09:11 Bobgy

@Bobgy

  1. pipelines in DB should have a namespace column

Shall we have Namespace in pipeline_versions schema? For example I see Description is in pipelines but not in pipeline_versions. Not sure what is the criteria to decide here, but seems users are not allowed to edit or change pipelines once loaded, right?

  1. We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines

A config option shall allow organizations to enable/disable the possibility pipelines can be public. Where do you think this config can be set (DB, configMap, config.json)?

arllanos avatar Nov 16 '20 14:11 arllanos

@Bobgy

  1. pipelines in DB should have a namespace column

Shall we have Namespace in pipeline_versions schema? For example I see Description is in pipelines but not in pipeline_versions. Not sure what is the criteria to decide here, but seems users are not allowed to edit or change pipelines once loaded, right?

Versions are sub resources of pipelines, so they should be in the same namespace of the pipeline.

Description was more of an unfinished migration to version API, ideally we should have description field on versions too.

  1. We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines

A config option shall allow organizations to enable/disable the possibility pipelines can be public. Where do you think this config can be set (DB, configMap, config.json)?

You should use viper to get this config like other configs in API server, so that it will be configurable via config map, config.json or env var directly.

Bobgy avatar Nov 19 '20 15:11 Bobgy

@Jeffwan @Bobgy

We completed the implementation using KFam (for now). From a UI standpoint, we only added a checkbox to indicate if the Pipeline is "Shared" or not, Pipeline Versions of course cannot be shared. So if you select "create new pipeline..." this checkmark will not be there.

Most of the changes were in the Backend. You can see the implementation details on this commit: https://github.com/arllanos/pipelines/commit/2c88722f6f05b67acc16c2b4e7bc54ff91c3f36c

We have not implemented an Env Variable to disable "Pipeline Sharing", however this shouldn't be too bad I think it should suffice to disable the button all together and Pipelines will be namespaced by default.

Next week I'll switch the authentication to SubjectAccessReview and make the PR.

One thing to note is that, "Shared Pipelines" are still deletable by other users in the System. I'm not sure what the ideal scenario should be here.

Screen Shot 2020-11-19 at 11 41 12 AM

maganaluis avatar Nov 19 '20 18:11 maganaluis

@Bobdy @maganaluis thanks for your efforts on this. Some comments:

or specify namespace='XXX' to find pipelines in a namespace + shared pipelines

@Bobgy I think it would be best to not return mixed results here. Specifying namespace='XXX' should only return pipelines from namespace 'XXX'.

We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines

@Bobgy let me propose something different. Instead of turning off features, why not protect them with authorization? For example, we can have two kinds (in RBAC only, not in the MySQL DB), Pipelines and ClusterPipelines (namespace=="", name inspired from Role and ClusterRole). If an organization doesn't want to use one or the other, they simply need to not give their users permissions for one or the other.

yanniszark avatar Nov 19 '20 18:11 yanniszark

@bobdy @maganaluis thanks for your efforts on this. Some comments:

or specify namespace='XXX' to find pipelines in a namespace + shared pipelines

@Bobgy I think it would be best to not return mixed results here. Specifying namespace='XXX' should only return pipelines from namespace 'XXX'.

You are right. Agree with that. So there needs to be some UI change to allow finding pipelines from either shared or namespaced ones.

We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines

@Bobgy let me propose something different. Instead of turning off features, why not protect them with authorization? For example, we can have two kinds (in RBAC only, not in the MySQL DB), Pipelines and ClusterPipelines (namespace=="", name inspired from Role and ClusterRole). If an organization doesn't want to use one or the other, they simply need to not give their users permissions for one or the other.

I guess it depends more on whether the shared pipelines will be a long term UX we will maintain. If it is, then I agree with your suggestion, that sounds like a clean solution. Do we have enough evidence it will be?

I think there're still quite some questions we need a clear answer on. @maganaluis will you be willing to draft a rough design doc for this issue

Bobgy avatar Nov 23 '20 09:11 Bobgy

Hmm, maybe we can start with your PR description for discussion and see if everyone can agree on that.

Note that, for above discussion, my personal opinion is that, the MVP PR do not need to implement any of the ClusterPipeline rbac stuff nor the "pipeline sharing" switch. We can start from the minimal and enhance based on further requests.

Bobgy avatar Nov 23 '20 09:11 Bobgy

One thing to note is that, "Shared Pipelines" are still deletable by other users in the System. I'm not sure what the ideal scenario should be here.

It's backward compatible behavior, so I don't think we need to worry about that right now.

Bobgy avatar Nov 23 '20 09:11 Bobgy

We still need UI work to finish this

Bobgy avatar Feb 26 '21 16:02 Bobgy

Remaining work item tracked in https://github.com/kubeflow/pipelines/issues/5084

Bobgy avatar Mar 01 '21 04:03 Bobgy

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 02 '21 17:06 stale[bot]

/lifecycle frozen

Bobgy avatar Jun 21 '21 06:06 Bobgy

To summarize on the decisions in the backend implementation when supporting namespaced pipelines: https://github.com/kubeflow/pipelines/pull/4835.

  1. We now have shared pipelines and namespaced pipelines.
  2. Past KFP pipeline resources are now shared pipelines, they do not belong to a namespace and all users can access them. It's TBD for implementing access control for shared pipelines, if we do in the future, we may use cluster-wide RBAC permissions.
  3. Namespaced pipelines belong to a specific namespace, they are access controlled using namespaced RBAC roles/rolebindings like other KFP resources.
  4. In the DB schema, both of them are pipeline resources, the difference is just whether the namespace field is empty or not.
  5. All backend API behaviors are backward compatible
    • when uploading a pipeline, without namespace -> shared pipeline; with namespace -> namespaced pipeline
    • when listing pipelines, without namespace -> returns shared pipeline only; with namespace -> namespaced pipelines only
  6. To be implemented, for KFP UI integration
    • on pipeline list page, there should be a new tab or checkbox that decides whether the user is looking at shared or namespaced pipelines
    • on create pipeline page, there should be a new checkbox that decides whether the uploaded pipeline should be shared or namespaced

references:

  • https://github.com/kubeflow/pipelines/pull/4835#issuecomment-765437001
  • https://github.com/kubeflow/pipelines/pull/4835#issuecomment-770495379

Bobgy avatar Aug 18 '21 05:08 Bobgy

Does the SDK support uploading namespaced pipelines? It does not seem to be included in the docs https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.client.html

jacobmalmberg avatar Oct 28 '21 14:10 jacobmalmberg

For anyone coming here and wanting a tl/dr on the issue search (some of this is a repeat of @Bobgy's great summary above):

ca-scribner avatar Feb 16 '22 18:02 ca-scribner

/cc @zijianjoy

Re SDK changes, it's just python changes, golang clients are auto generated before each release.

Bobgy avatar Feb 17 '22 00:02 Bobgy

Oh great! so something like the api behind this call does have the ability to accept namespace once things are regenerated?

ca-scribner avatar Feb 18 '22 14:02 ca-scribner

Basically for the front end is it a matter of transplanting code like this:

https://github.com/kubeflow/pipelines/blob/0c0e4b61050e6fea6f20dd35e14eb3edc214bbba/frontend/src/pages/ExperimentList.tsx#L203-L204

and this:

https://github.com/kubeflow/pipelines/blob/0c0e4b61050e6fea6f20dd35e14eb3edc214bbba/frontend/src/pages/ExperimentList.tsx#L291-L294

Into the pipeline screens: https://github.com/kubeflow/pipelines/blob/0c0e4b61050e6fea6f20dd35e14eb3edc214bbba/frontend/src/pages/PipelineList.tsx#L178

?

grobbie avatar Feb 25 '22 08:02 grobbie