notification-controller
notification-controller copied to clipboard
Add possibility to configure wildcard namespace value in Alert eventSources
It would be nice to pick up all error events within a cluster using a wildcard namespace value in the eventSources section of an Alert - similar to how it works for name: https://github.com/fluxcd/notification-controller/blob/fbf1ea0413e12fe58e6386972468a152c42b215c/internal/server/event_handlers.go#L77
Currently you'll either need to duplicate Provider (including any secrets) and Alert resources to all relevant namespaces or create a "global" Alert which references other namespaces. It would be simpler and less error prone to allow namespace: "*"
.
I dont see any issue implementing this, as it would just be a catch all for all events in all namespaces. Or events from a specific resource kind in all namespaces. As we already allow event sources from multiple namespaces it would just act as a helper to avoid having to type out each namespace. One thing to keep in mind is that there would probably be a lot of events generated from doing this.
@stefanprodan do you have any opinion about this?
This would break multi-tenancy, imagine a tenant will create a "global" alert in their namespace and it will receive events with sensitive information about all the other tenants. I find this unacceptable, imagine if AWS would allow anyone to route all events from Cloudwatch no matter the account.
I have also been thinking about that. In theory it would be possible if we were able to limit the event sources to the permissions the service account that is assumed has. We can obviously not allow anyone to export all events in the cluster.
Not sure if it helps, but it'd be fine if it was only allowed for alerts defined in the flux-system namespace.
Not sure if it helps, but it'd be fine if it was only allowed for alerts defined in the flux-system namespace.
This would work for us.
That said, I think the issue around multi-tenancy is a cluster policy issue rather than something that needs to be solved by this project. It's up to the cluster administrator to set policy on who can create Alert
resources, and whether they can create them with a namespace of *
.
Perhaps as a stopgap, the ability to have *
for the namespace could be behind a flag, so that the cluster administrators have a simple way to allow or disallow this functionality?
How about a new kind, ClusterAlert, in order to be alerted on all object clusterwide ? Like ClusterRole and Role.
This would break multi-tenancy, imagine a tenant will create a "global" alert in their namespace and it will receive events with sensitive information about all the other tenants. I find this unacceptable, imagine if AWS would allow anyone to route all events from Cloudwatch no matter the account.
This is no way equivilant comparison. We allow flux in a single namespace(flux-system) to manage resources in all other namespaces without explicitly opening up each namespace for flux controllers. In the same approach, we should be able to set notifications+alert for those resources.
I, as many others, would have a use case for the proposed behaviour, and would be happy to put some work towards it. I've used Flux extensively for about a year now but never really dug into the codebase.
For what it's worth I feel like @nvanheuverzwijn's suggestion of having a non-namespaced ClusterAlert
resource is the cleaner approach, because it doesn't break any existing functionality, accurately represents what namespace: '*'
would try to achieve on a more abstract level, and would allow using RBAC and other familiar mechanisms for multi-tenancy scenarios.
Could this issue be put onto the agenda for the next meeting or something? It seems that there's some disagreement about if and how this should be implemented at all, and I'd like to have a clear goal for what a PR solving this issue should entail.
A ClusterAlert doesn't solve much because it would refer to a ClusterProvider that would refer to a Kubernetes secret, and Kubernetes team rejected the proposal of having ClusterSecrets. I'm for revisiting the namespace wildcard option after RFC-0003 gets approved.
Since #319 implemented a way to disable cross-namespace references, is this something you would consider now @stefanprodan?
To add some context here, a piece of feedback I receive from developers after rolling out flux2 is that they don't know fast enough if their helm releases fail. Our cluster setup is a namespace per customer (lots of namespaces). We are monitoring this externally via datadog checks, but it would be great if the infrastructure team could take care of this for developers. Our current workaround is enumerating namespaces and making lots of objects, but it's slightly error prone and noisy.
+1 - Would be nice to have this feature and allow wildcards for namespaces.
How are others working around this issue? I'm thinking of writing a poll based system to cron the flux
CLI.
There is already an option to prevent cross-namespace alerts: https://fluxcd.io/flux/components/notification/alert/#disable-cross-namespace-selectors
So there should be nothing stopping us from implementing *
wildcard namespaces, and just don't allow them if that flag is enabled.
I have stumbled over the non-ability to have a central approach on notifications in my clusters as well. I have all clusters managed with HelmReleases for all applications running, and I have, as typically, one namespace per application. I have decided to have the HelmRelease resources inside the namespaces of the applications, because that's also the place where the "helm chart installs", meaning that when doing a helm ls
inside the namespace, the chart doesn't appear when the HelmRelease
resource is in flux-system, for example. The non-ability of an Alert
resource to reference a Provider
in another namespace and the inability of the Alert
resource to catch up events from other namespaces brings me into the situation of not being able to use it at all: I would have to deploy an Alert
resource, a Provider
resource AND the necessary secrets for them to work in every single application workspace. That's no fun and security wise worse than having a centralized approach where a "global" Alert
resource is able to take care of the entire cluster.