Provide an option to deploy statically the webhook resources
What would you like to be added:
In a more security restricted environment, there is a requirement to remove the permissions for admissionregistration.k8s.io from security-profile-operator cluster role in order to reduce the attack surface. This is not achievable with the current approach which dynamically creates/updates the webhook related resources. These API actions need these permissions. An alternative will be to create the webhook related resources statically as part of the operator deployment and skip the dynamic creation. It seems that they do not have dynamic dependencies.
This feature will introduce a running mode where the webhook resources are statically deploy outside of the operator. Most likely this will require a configuration option in the SPOD, in which case the operator will skip the creation and update of webhook related resources.
Why is this needed:
This is needed to remove the admissionregistration.k8s.io permissions from security-profiles-operator cluster role.
User story covered
I think this is good in general, I was suspecting that this request will come sooner or later. Just see one question and one historical remark below.
- were you thinking about changes to the RBAC manifests, too? You mentioned a SPOD option, but were you also thinking about not creating the perms at all (maybe through a kustomize manifest)? A spod option seems like a dynamic thing to me though and I wonder if it's really a measurable improvement if the operator still can modify webhooks, but is just told not to. On the other hand, if it the webhook permissions were not dynamic, while have a runtime option?
- just to give some context about the configurable webhooks: there are annotations that differ across disctributions plus the webhooks can be configured to listen on different namespaces or with different failure modes. The failure options and namespace selectors could be just removed probably and it could be documented that the administrator is supposed to configure the webhooks correctly and the annotations could be handled with kustomize manifests per distro. But the other thing I remember is that it was nontrivial to package webhook manifests in an operator bundle unless the whole operator is packaged using operator-sdk and the webhooks managed by the SDK. If we were to remove the ability to configure webhooks and have them created by the operator completely, then I'd have to figure out how to package the webhook manifests in an operator bundle. And I think the answer to that would be to create the bundle manually and not with operator-sdk, for better or worse, operator-sdk create bundle assumes that the operator it is bundling is also scaffolded by operator-sdk. Alternatively, we could re-scaffold SPO with operator-sdk, but 1) I've tried that and it's a lot of work and 2) the longer I work with operator-sdk on other projects, the more I appreciate SPO not using it :-)
I'm in favor of the enhancement in general, but I'm wondering which other implications they have. Do they make the overall deployment of the operator harder? I don't think so, correct?
were you thinking about changes to the RBAC manifests, too? You mentioned a SPOD option, but were you also thinking about not creating the perms at all (maybe through a kustomize manifest)? A spod option seems like a dynamic thing to me though and I wonder if it's really a measurable improvement if the operator still can modify webhooks, but is just told not to. On the other hand, if it the webhook permissions were not dynamic, while have a runtime option?
I am thinking to add an option in the SPOD configuration and of course remove the permissions from RBAC cluster role through kustomize when the option is on. The reason for this is to keep the backward compatibility and the existing behaviour, so the webhook would be configurable in two ways:
- dynamically as currently is implemented, in this case the option will be set to off and the permissions kept into RBAC role
- statically through kusomize, the permissions will be removed from RBAC role, the option will be set to on in the configuration and webhook manifest will be added to the deployment
I believe in this way we can keep the existing deployment and generate a deployment with a static webhook through kustomize. Maybe we can remove the dynamic version when we have a solution for operator packaging.
What do you think about this proposal?
@ccojocar doing it through kustomize in a static manner would be quite nice and shouldn't be too problematic. Installation-wise, this would be toggled on/off through a boolean we have Helm packaging ready (https://github.com/kubernetes-sigs/security-profiles-operator/issues/985 is in progress). My only concern is the OLM installation, I don't know how we could add a toggle for such a feature... I found https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/subscription-config.md but am not sure if this is sufficient. @jhrozek got any ideas?
@ccojocar doing it through kustomize in a static manner would be quite nice and shouldn't be too problematic. Installation-wise, this would be toggled on/off through a boolean we have Helm packaging ready (#985 is in progress). My only concern is the OLM installation, I don't know how we could add a toggle for such a feature... I found https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/subscription-config.md but am not sure if this is sufficient. @jhrozek got any ideas?
No, I don't think this is sufficient, the only way this could help us is that someone could use an environment variable to control if the hook would be created by default or not using the Env key. This might be useful as well, but maybe could be a follow-up.
Anyway, about the OLM - I think we need to decide as a project what is the default (webhook on/off) and package that in OLM. I don't think that the feature absolutely needs to be configurable in OLM as long as we have the runtime config (which I do like after all).
were you thinking about changes to the RBAC manifests, too? You mentioned a SPOD option, but were you also thinking about not creating the perms at all (maybe through a kustomize manifest)? A spod option seems like a dynamic thing to me though and I wonder if it's really a measurable improvement if the operator still can modify webhooks, but is just told not to. On the other hand, if it the webhook permissions were not dynamic, while have a runtime option?
I am thinking to add an option in the SPOD configuration and of course remove the permissions from RBAC cluster role through kustomize when the option is on. The reason for this is to keep the backward compatibility and the existing behaviour, so the webhook would be configurable in two ways:
* dynamically as currently is implemented, in this case the option will be set to off and the permissions kept into RBAC role * statically through kusomize, the permissions will be removed from RBAC role, the option will be set to on in the configuration and webhook manifest will be added to the deploymentI believe in this way we can keep the existing deployment and generate a deployment with a static webhook through kustomize. Maybe we can remove the dynamic version when we have a solution for operator packaging.
What do you think about this proposal?
Hi @ccojocar sorry about the delay in replying, I was on a long vacation and I'm still processing the e-mails and github notifications.
I do like your proposal! After triaging some CI failures and issues our QE team found while we are trying to stabilize SPO for openshift, I do agree that we should have an option to deploy the webhooks by default or not during runtime. I would even argue that the option should be off by default (but that's a detail and I can set the default in openshift).
The biggest issue I've seen is that the SPO webhooks are set to listen to a wide set of namespaces so any issue with the webhooks has the possibility to wreck the whole cluster.
Feel free to tag me on Slack or continue here if you want to continue fleshing out the ideas or if you would like to pair on the implementation.
Thank you for bringing this up!
@ccojocar this can be closed, right?
yes, it is fixed by #1053.