Allow operator to start even if not all watched namespaces are accessible
Bug Report
If the operator is configured to watch a certain list of namespaces (lets say ns1, ns2) if any one of them is not accessible the operator cannot create the informers and simply won't start even if other namespaces could be watched.
It would be better if the operator would simply LOG an error and periodically retry the creation of the informers for the namespaces with the missing access rights. This could be important in environments when the operator is watching user controlled namespaces, and to avoid scenarios where the user accidentally (or intentionally) removes the rolebindings that the operator needs for watching it. Currently a malicious user can basically kill the operator by simply deleting the rolebinding.
What did you do?
Start the operator to watch ns1,ns2 but only set the role binding for ns1.
What did you expect to see?
Log periodic error about missing permissions for ns2 and start watching ns1
What did you see instead? Under which circumstances?
The operator doesn't start at all.
Environment
Minikube
IMO this makes sense for the case when the operator is watching a list of namespaces.
Wondering if this behavior still makes sense when it is either watching one namespace or the whole cluster? What do you think?
I think it makes sense to throw a fatal error in those cases, I agree.
Ideally we should have a good way to expose framework level errors to the implementing operators. This way users could integrate logging, metrics, health probes as they wish.
This ties back to https://github.com/java-operator-sdk/java-operator-sdk/issues/1340
Ok so let me formulate the behavior:
- If the operators is able to watch at least one namespace regarding the primary resource it will start, and try to periodically establish watch the remaining namespaces if any is configured.
- if there is only one namespace or cluster scope or cluster scoped resource configured to be watched as primary resource and the controller is not able to do so, it will fails. Optionally do a callback so users are able to react with custom logic.
- If not the primary but the secondary informers are not able to start, this has not effect on startup. Just will be try to create watches periodically on the target namespaces, and log if it is successful or not (on warning? level if not).
Will create a separate issue for probes related to #1340 and this.
@gyfora please take a look on the comment in the related umbrella issue: https://github.com/java-operator-sdk/java-operator-sdk/issues/1422#issuecomment-1227190373
and please let us know what do you think. It seems that this is quite risky to support IMO.