operator icon indicating copy to clipboard operation
operator copied to clipboard

Prune Istio Namespace created by Operator when deleting Operator cr

Open irisdingbj opened this issue 4 years ago • 36 comments

istio/istio#18384

  • Do not update Istio namespace if the namespace is not created by Istio Operator

  • Prune Istio Namespace if it is created by Istio Operator

  • Leave Istio namespace if it is not created by Istio Operator

irisdingbj avatar Oct 30 '19 08:10 irisdingbj

https://github.com/istio/istio/issues/18384

irisdingbj avatar Oct 30 '19 08:10 irisdingbj

@ostromart @howardjohn @rcernich I agree the comments that we should not delete the namespace if it is created by user.

But if the namespace is created by our operator. It means operator need manage the lifecycle of the namespace as said by @rcernich

So this is what the PR want to do:

Delete the ns if it is create by operator. Leave the ns untouched(not add related operator label and annotation) and do not delete the ns if it is not created by operator.

irisdingbj avatar Oct 31 '19 01:10 irisdingbj

/test gencheck_operator

sdake avatar Oct 31 '19 05:10 sdake

@ostromart @howardjohn @rcernich I agree the comments that we should not delete the namespace if it is created by user.

But if the namespace is created by our operator. It means operator need manage the lifecycle of the namespace as said by @rcernich

So this is what the PR want to do:

Delete the ns if it is create by operator. Leave the ns untouched(not add related operator label and annotation) and do not delete the ns if it is not created by operator.

Agree. Objects created by automation should be deleted by said automation. The objection is the side effect of deleting a namespace could have unintended consequences of deleting objects erroneously added by a user in the deployed namespace.

This should be simple enough to control given the manifest we ship and the WATCH_NAMESPACE environment variable. If a CR is registered in this WATCH_NAMESPACE and the CR's https://github.com/istio/operator/blob/master/pkg/apis/istio/v1alpha2/istiocontrolplane_types.proto#L186 is set and a namespace is created setting appropriate annotations and labels indicating the namespace was created by the operator, the user no longer has an expectation of data stored within this namespace to be retained.

Cheers -steve

sdake avatar Oct 31 '19 06:10 sdake

Regarding the namespaces, we shouldn't be creating them at all. We should verify its existence in the validator and reject the CR if the target namespace does not exist or if the user creating the CR does not have permission to access the specified namespace. I believe this could be a potential security issue, e.g. if a user specifies a rogue image along and requests installation into a system namespace. We should really reconsider the current approach.

rcernich avatar Oct 31 '19 15:10 rcernich

+1 to @rcernich shouldn't Istio be installed in the same namespace as the ICP, which implies the namespace already exists?

howardjohn avatar Oct 31 '19 15:10 howardjohn

We need to create the namespace to accommodate a single line install. There's just no way around it. The current check in the code is insufficient - what if a user had already created the ns before installing the operator? Either we check for that condition and not delete the ns after, or (better imo) leave the ns alone when deleting. It's such a trivial step, I don't see it as necessary to take the risk or introduce extra code to do it.

ostromart avatar Oct 31 '19 17:10 ostromart

@ostromart the issue is that the user shouldn't be specifying the namespace at all when using the operator. The namespace within which the CR is created is where the operator should create the resources. The operator use case is different from the command-line use case, where the cli is creating resources which need to be created/applied separately.

rcernich avatar Oct 31 '19 17:10 rcernich

@howardjohn there are good reasons for keeping the operator and ICP in a separate ns from Istio, namely creating an administrative boundary and making it easy to uninstall operator without uninstalling Istio. RH take is a step further in their model by having the ICP in a completely different ns. We support that through a flag but currently the flag is set to the operator ns. The operator install currently creates the istio-operator ns. IMO we should not delete this ns either when operator in removed because a user may be transitioning from using the controller to using istioctl manifest apply, in which case they will want to preserve their ICP CR and use that as an input to istioctl.

ostromart avatar Oct 31 '19 18:10 ostromart

@ostromart, the operator can watch any namespace and should be watching all namespaces. When an ICP is creating in a namespace, that's the namespace that should contain the control plane. The operator should be completely isolated and regular users should be able to create control planes. Typical configuration scenarios for an operator are:

  • Only watch operator namespace (not an option for Istio because of the perms required by the operator)
  • Watch a single namespace (other than operator namespace)
  • Watch a set of namespaces (difficult to implement in practice)
  • Watch all namespaces

The operator install should not create the namespace. The flow should be:

  1. create operator namespace
  2. kubectl apply -n operator-namespace -f operator.yaml

Removal should be: kubectl delete -n operator-namespace -f operator.yaml

The goal: user must be cluster-admin to install operator, but any user should be able to install a control plane.

rcernich avatar Oct 31 '19 21:10 rcernich

@ostromart @sdake @howardjohn @rcernich @linsun Thanks guys for your comments here.

SO we have two namespace here: the operator namespace and the istio installation namespace(eg: istio-system).

I agree we should keep the operator namespace untouched.

Now we will add operator labels and annotations into the istio installation namespace(eg: istio-system) even if it exists before hand.

So we should at least make the logic in pkg/helmreconciler/rendering.go within this PR go into master code. -- This is logic to make sure operator will not touch the pre-existed Istio installation namespace.

For whether we need prune the istio installation namespace(eg:istio-system) and operator namespace, Let's open an issue and move our discussion there to get a solid agreements.

Does this sounds reasonable?

irisdingbj avatar Nov 01 '19 06:11 irisdingbj

The user installing the control plane should create the istio-system namespace. They should create the ICP resource in istio-system. (Replace istio-system with any namespace the user wants to use for installation.)

rcernich avatar Nov 01 '19 14:11 rcernich

Rob, what's the motivation for putting the ICP CR in istio-system? It's possible to create rbac that restricts a user from touching anything other than ICP CR in either ns.

ostromart avatar Nov 01 '19 16:11 ostromart

Once again, ICP should sit beside the installed control plane. It should not be collocated with the operator itself. The ICP CR specifies the configuration for a control plane. The operator watches namespaces for ICP resources and act on it accordingly. Regular users should have no access to the operator namespace, especially because the operator service account has elevated privileges, which are required to instantiate resources across the cluster. A regular user should be able to create an ICP in a namespace they have access to, so they can install a control plane, without needing to be cluster-admin. This is the basic problem solved by the operator: normal users can manage a control plane installation.

Yes, RBAC can be used to restrict who can create/update/delete ICP CRs, but, as mentioned above, one of the benefits of using an operator is that a user does not need cluster-admin to administer a control plane installation.

rcernich avatar Nov 01 '19 17:11 rcernich

Once again, ICP should sit beside the installed control plane. It should not be collocated with the operator itself. The ICP CR specifies the configuration for a control plane. The operator watches namespaces for ICP resources and act on it accordingly. Regular users should have no access to the operator namespace, especially because the operator service account has elevated privileges, which are required to instantiate resources across the cluster. A regular user should be able to create an ICP in a namespace they have access to, so they can install a control plane, without needing to be cluster-admin. This is the basic problem solved by the operator: normal users can manage a control plane installation.

Yes, RBAC can be used to restrict who can create/update/delete ICP CRs, but, as mentioned above, one of the benefits of using an operator is that a user does not need cluster-admin to administer a control plane installation.

@rcernich sounds consistent to me. Some use cases may wish to restrict who can install a control plane. Istio's control plane uses cluster scoped resources, so we must have the ability to restrict where ICPs can be registered in the model you propose. This more consistent approach doesn't really feel appropriate for a backport.

How can we solve this specific problem without major rework for 1.4?

Cheers -steve

sdake avatar Nov 01 '19 17:11 sdake

Hey @sdake, I'm not sure about 1.4, but based on my limited understand, we should reconfigure the operator deployment so that the operator watches all namespaces. As for users creating ICP resources, that will already be constrained based on RBAC, so admins will need to grant create/update/delete roles to individual users/groups. For folks used to doing everything as cluster-admin, the only difference they'd see is that they create the ICP resource in whatever namespace (e.g. istio-system). I don't know if the namespace fields in ICP are required, but if they're not, they should just be left blank. If not, the operator should balk if the namespace doesn't match the containing namespace and/or a validator checks the setting and rejects it. I'm guessing, of all this, all that needs to be done is modify the operator deployment yaml and override the target namespace with the namespace containing the ICP before rendering the charts.

rcernich avatar Nov 01 '19 17:11 rcernich

Apologies in advance if I'm missing something obvious as security is not my forte. Just trying to understand your proposal better. I have no objection to moving the CR out of istio-operator. In the code we have it's already straightforward to do so without any code changes if that's how a user wishes to deploy it. The bit I don't understand is why moving it into istio-system makes it more secure. If you have a model where the user is not allowed to make changes to the control plane except through the CR, and you don't find restricting access to only ICP CR through RBAC sufficiently secure, how does having the CR in istio-system solve the problem? Afaict you're just creating a problem that the user must now create istio-system so they can populate the CR - so you give up the one line install and delete. I can see an argument where the CR lives in a completely different ns (say istio-operator-cr) and user only has access to that, but I'm missing why it should be in istio-system.

ostromart avatar Nov 01 '19 18:11 ostromart

The operator namespace should be managed by cluster-admin, so moving ICP out of operator namespace allows control planes to be created with lower level permissions, although they would still need permissions to create/update/delete ICP resources. When using Operator Lifecycle Manager to install the operator, roles for managing the operator's associated CRDs are created along with the operator and bound into the standard admin/edit/view roles. This effectively means each namespace's administrators may instantiate an ICP resource, and thus their own control plane. This will become much more important when Istio supports multitenancy (similar to Maistra). It also becomes more important as the roles required by the components themselves are reduced.

Also, this is the general convention used when working with operators, although most of the k8s oriented examples seem to just dump everything into default namespace (which might be great for kicking tires, but falls short when working in a managed cluster).

HTH

rcernich avatar Nov 01 '19 19:11 rcernich

In general I agree with the principles @rcernich mentioned here. Operator is cluster/mesh admin, which has the privilege to read all ICPs from different namespaces. Each namespace admin is responsible for managing the resource in his own namespace.

From the examples we discussed, I only see "operator" and "istio-system" namespaces are mentioned. Both should be managed by cluster/mesh admins. Do we have use cases that the ICPs are namespace specific (hence should be set by namespace admins)?

liminw avatar Nov 01 '19 20:11 liminw

From the examples we discussed, I only see "operator" and "istio-system" namespaces are mentioned. Both should be managed by cluster/mesh admins. Do we have use cases that the ICPs are namespace specific (hence should be set by namespace admins)?

My understanding: Right now, running Istio basically requires admin privileges today, so this is less important. In OpenShifts case, they have some modifications to Istio that actually do allow namespace admins to run Istio with minimal privileges, multiple Istio installs, etc. We will hopefully start moving in this direction, so we should make sure the operator will support this. I recommend we follow Rob's suggestion of creating ICP in the namespace that istio will be installed in, to follow the pattern of other operators.

On Fri, Nov 1, 2019 at 1:45 PM Limin Wang [email protected] wrote:

In general I agree with the principles @rcernich https://github.com/rcernich mentioned here. Operator is cluster/mesh admin, which has the privilege to read all ICPs from different namespaces. Each namespace admin is responsible for managing the resource in his own namespace.

From the examples we discussed, I only see "operator" and "istio-system" namespaces are mentioned. Both should be managed by cluster/mesh admins. Do we have use cases that the ICPs are namespace specific (hence should be set by namespace admins)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/istio/operator/pull/484?email_source=notifications&email_token=AAEYGXORBKQ5LLB6HV3ZUZLQRSINHA5CNFSM4JGVKCO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC4DS4A#issuecomment-548944240, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEYGXKHZNCTDIXN2VO7GFLQRSINHANCNFSM4JGVKCOQ .

howardjohn avatar Nov 01 '19 20:11 howardjohn

@liminw, two things:

  1. Service mesh admin need not be the same as cluster-admin.
  2. We (Red Hat) would like to contribute the work we did to support multitenancy in the Maistra project back upstream into Istio proper. This work includes a reduction in the scope of permissions assigned to Istio component service accounts, along with moving functionality requiring elevated privileges into the operator. This means that any user can install a service mesh and more than one service mesh can be installed in a cluster.

The second point is what is really driving the rationale behind this. The goal of the operator should be to allow self-provisioning within a managed k8s cluster. At present, this would mean that a service mesh admin would have greater privileges than a regular user, but would not have cluster-admin privileges.

HTH

rcernich avatar Nov 01 '19 20:11 rcernich

The operator already supports the CR being in any namespace the admin doing the install chooses. I'm still wondering why that should be istio-system today, given that we're trying to restrict user privileges, rather than another ns altogether? Per Limin's point, istio-system should also be for admins. User should only be able to interact with Istio installation through the CR they control.

ostromart avatar Nov 01 '19 21:11 ostromart

istio-system should also be for admins. User should only be able to interact with Istio installation through the CR they control.

The user that creates the control plane namespace (e.g. istio-system) IS the "admin" (for that control plane/mesh). The idea is you have one operator listening/watching all namespaces. Bob creates/installs Istio via a CR he creates in his bob-istio-system and Mary creates/installs Istio via her own CR in mary-istio-system. Each is the admin for their own mesh - each has permissions to their own control plane namespace. They should never have permission to even see istio-operator namespace let alone create objects in that namespace.

jmazzitelli avatar Nov 01 '19 21:11 jmazzitelli

From @rcernich and @jmazzitelli 's comments, there are multiple control planes shared by different tenants, and all of them share the same operator. This requirement seems reasonable for such use case. I think we should follow @rcernich 's suggestion to allow the tenants to own their individual configuration.

liminw avatar Nov 01 '19 21:11 liminw

I completely agree that for multi-tenant and/or managed case it makes sense not have the CR in istio-operator. I do still wonder why the cluster admin would install a managed Istio for a user and then afterwards allow them to interact directly with the control plane resources, rather than having a separate config namespace and having them do it solely through the CR. It seems difficult to support as a managed solution, but there's nothing preventing this in the operator - just listen to all namespaces. We can make both changes in a new PR. In practical terms for this PR, I think at least everyone agrees that the operator should not delete the ns? Sorry @irisdingbj for hijacking your PR for this lengthy discussion...

ostromart avatar Nov 01 '19 22:11 ostromart

@ostromart, cluster admin just installs the operater. A mesh admin (designated by cluster-admin) manages the control plane entirely: creates namespace, creates CR, etc. This mesh admin should be managing the control plane installation through the CR, not manually. The mesh admin should not be modifying the resources managed by the operator.

rcernich avatar Nov 01 '19 23:11 rcernich

and then afterwards allow them to interact directly with the control plane resources, rather than having a separate config namespace and having them do it solely through the CR.

Just to be clear, the tenant owner should NOT be allowed to interact directly with control plane resources that are created by the operator. I think the way you are thinking it should work, is how it works (minus the part about needing a separate config namespace). The tenant owner should configure things solely through the CR - this is true. The tenant owner does not interact directly with control plane resources. They will not (should not) have rights to do that - I believe this will be the job of the cluster-admin to provide the correct permissions to the tenant owners (right @rcernich?). But the tenant owner will have the rights to create, modify, and delete CRs. It should be fine to be able to put the CR directly in the control plane namespace.

So, in short (and @rcernich can explain the RBAC issues more in depth and correct me where I am wrong, but this is how I understand it) the tenant owner will not have the necessary permissions to directly manipulate the control plane resources - the tenant owner will only have permissions to manipulate the CR itself. It is the job of the cluster-admin to provide the correct permissions to the proper tenant owners.

jmazzitelli avatar Nov 02 '19 08:11 jmazzitelli

Hey @jmazzitelli, nothing prevents namespace admins from interacting with those objects and currently we don't watch for changes of those resources to overwrite any changes made by somebody other than the operator. Additionally, because we use three way patching during updates, it's likely that any changes made by a user will be persisted through updates, unless they changed something that is explicitly set through the operator. This was done on purpose to give users the ability to workaround any issues or temporarily change things based on operational needs (i.e. escape hatch). That said, they really shouldn't be messing with things, as they can put things into a bad state (e.g. deleting a deployment). There are also practical concerns given the amount of settings that might have to be exposed. In reality, this is no different than working with the charts/templates directly.

Having said all that, the user should be doing everything through the CR, even though they may have the ability to interact with the resources created by the operator. (Sort of like buying a car and deciding to work on it yourself.)

rcernich avatar Nov 04 '19 15:11 rcernich

Sort of like buying a car and deciding to work on it yourself.

OK, got it.

I tell people "don't mess with the operator-created resources - just edit the CR to modify your operand installation." But good analogy with the car - if you know how to change your own oil - feel free. Just don't complain to me when you seize the engine. :)

jmazzitelli avatar Nov 04 '19 16:11 jmazzitelli

After reading through this (long) review, I am unclear on the next actions for @irisdingbj to make to get this PR merged. Options could be simple like "change X" or "change Y" in this specific code. If the answer is more complex, as this review would seem to indicate, then please suggest filing a bug around the points of contention so they may be addressed in followup PRs.

It is unreasonable to expect @irisdingbj to resolve all of this thread in one PR - the PR would be massive. I strongly prefer forward iteraataive progress over a stalled PR.

Cheers -steve

sdake avatar Nov 04 '19 17:11 sdake