robusta
robusta copied to clipboard
CRD support for watching?
Is your feature request related to a problem? Please describe. I'd like to be able to watch for changes to CRDs, but the docs don't describe that. It appears it's not supported.
Describe the solution you'd like Extend support to register CRDs to trigger on changes
Describe alternatives you've considered Doing it myself (writing a small service), looking at kube-watch and the open PRs there for how this may be supported.
Additional context N/A
Thank you!
Hey, it's not supported today but we'd love to help with this.
We wont be able to implement it by ourselves in the next few weeks, but we can support any work you do on this, review a PR (and merge it), and of course answer any questions you have about the code base and how to implement it.
There are three major options here:
-
If you want to only use Kubewatch (and not the Robusta features) then feel free to open a PR for our Kubewatch fork and we'll be happy to review and merge it. We have a number of improvements over the original Kubewatch and until we can get them merged upstream we'll continue to maintain and support the fork. Among other things, we've added webhook support for sending all changed fields, fixed vulnerable dependencies, and added on support for more Kubernetes types.
-
If you want Robusta's full feature set for CRDs (i.e. the ability to run Robusta actions, the ability to filter out Kubernetes changes you don't care about, and the ability to send output to any Robusta sink) then you'll have to both add support to Kubewatch and make some small changes to the robusta-runner so that it knows what to do with the events that Kubewatch sends it. (Internally, Robusta uses our Kubewatch fork to listen to events and then forward them to the runner which takes action on them.)
-
Generally speaking, the major downside of Kubewatch's implementation is that it requires code changes to add on new resource types (including CRDs) and it has to be recompiled each time. It's entirely possible to implement something like Kubewatch using generic Kubernetes APIs and to take as input a list of resources to watch (e.g. from a yaml config file). I started work a while ago on a Kubewatch alternative that uses the generic APIs. I never did a release, but the code works and it might be easier to add CRD support to this than to Kubewatch itself. Eventually we'd like to get this released and integrated into Robusta itself, but it hasn't been prioritized for some time. If you choose to go this route, I can help you stream the data to Robusta as well, so you'll be able to use the full Robusta feature set.
Let me know what you're interested in. I'd also love to hear about your use case to better understand what would work for you.
I feel like what I actually need is to watch the kubernetes audit log, which does tell me about all the operations that happen across the cluster. What I haven't figured out yet is to get the actual diff of what's changing. I'm simply seeing the requested fields of the object.
Sorry to be a nag, but I'd still love to hear more about the use case! We've considered adding an integration with the Kubernetes audit log before but don't have enough use cases to support it yet.
So if I can better understand what you're trying to achieve (and why) maybe we can do it!
@aantn I'm not actually sure this is possible with just the auditlog, I'm trying to get some more information from a kubernetes maintainer about this.
💡 Use Case
The use case though is basically this repo, or kubewatch, or https://github.com/grafana/kubernetes-diff-logger, or Komodor.com
I want to track changes happening to objects, and be able to a minimum (v1) just report those changes, allowing developers to understand what's going on in the cluster.
Since the audit logs are structured json logs, it's easy to query by any field in something like NewRelic, as well as do many sorts of visualizations of the data.
⛰️ The Challenge
I can currently see the requests that are coming in (create, update, patch) and it's easy to find which one of these "succeed". What it doesn't tell me is what's actually changing, just that there was a request to set all the fields of the object, and what the response was (which I believe is important to see the changes that occur from mutating webhook configuration)
If there's no way to get the actual diff via the logs, then I would like to create an "event-sourcing" application, that creates a baseline "checkpoint" - basically loop through all api resources (available) in the cluster, and get each resource (for namespaced resources, loop through each namespace to get those).
From there, I'll have a baseline state of every object. Then if I can just watch the audit log, I can reconstruct the changes to every object. Any create event would just register a new baseline state for that object.
I believe this would provide a very solid method of tracking changes, that would not be as susceptible to missing changes if the app crashes/restarts or has to be redeployed. I think there's a lot of very useful information in the audit log too, that is probably unavailable from just a watch. Except in the cases of generating the initial "checkpoint" of existing objects, this app would not actually need RBAC permissions to the cluster to get objects.
🔧 The Wrench
There would likely need to be several different providers for how get the logs. Depending on security requirements, it may be useful to have the logs shipped to this application as a sink (like with fluentbit), which would allow k8s operators to choose exactly what events this application can see (such as access to configmaps/secrets with sensitive information).
A little bit of a wrench, but also could be a benefit, as the app would only ever see logs that were explicitly allowed for it to see, reducing the application requirements and possible attack vectors.
❓ Thoughts?
What do you think about this approach?
It makes total sense and it's doable with Robusta. We can't yet statisfy every requirement, but what you're trying to do is one of Robusta's major goals. So we're committed to fixing the missing parts too.
You can find a good intro to our change-tracking capabilities in the relevant tutorial. I'll cover some additional stuff below.
What you can do with Robusta today
- Push model: You can send push messages to Slack/MSTeams/Kafka and other destinations for changes you care about. This is super granular as the above tutorial explains. Full docs on this are under the resource_babysitter action.
- Robusta UI: This tracks changes to cluster resources all the time. The data is always there when you need it and it's correlated with your alerts. This is powered by the same exact mechanisms as everything else here. It's just that the destination is smarter about how to display it.
- Grafana integration: You can add annotation to your existing Grafana graphs showing what changed and when. See the relevant action in the docs.
- Reverse-gitops: you can store and audit changes with a git repository as the storage backend. Resources are stored in the filesystem based on their cluster, namespace, and name. See the docs.
- Custom Robusta playbooks: I could be wrong, but I think this is what you've been considering so far. It's good but it's really the building block used to implement the above.
How it works
We're listening to API server changes using the WATCH API. This works extremely well. From a security perspective, you can also configure which resources are/aren't readable using RBAC so you can get all the granularity you want. (Robusta can provide similar granularity regarding the destinations you ship to even if you choose to let Robusta see everything.)
The only downside compared to using the audit log is that tracking who performed changes is less straightforward. It's doable - there are a number of possible solutions - but with the audit log you would get this for free.
Stuff we're still adding to Robusta
- Tracking who made changes
- Tracking changes to CRDs
Questions
Do any of the above 5 capabilities solve what you're trying to do?
The audit log gives you the who. It tells you about other ”failed" changes too (useful to see), and since it's following the audit log, it means you will never miss changes** - in theory you should be able to resend logs you missed if the app was offline. It doesn't seem that any of these things are possible without watching the audit log.
Yes, that's correct. Regarding not missing changes, what's a realistic scenario where you would miss changes with the WATCH api? I understand that there are theoretical cases where it can happen, but in practice it works extremely well and I'm yet to see a real world case where changes go missing.
On Mon, 21 Feb 2022, 2:29 Wes McNamee, @.***> wrote:
The audit log gives you the who. It tells you about other ”failed" changes too (useful to see), and since it's following the audit log, it means you will never miss changes** - in theory you should be able to resend logs you missed if the app was offline. It doesn't seem that any of these things are possible without watching the audit log.
— Reply to this email directly, view it on GitHub https://github.com/robusta-dev/robusta/issues/213#issuecomment-1046360127, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADYUB6IOXIBOYQ4D6M3XZLU4GBO7ANCNFSM5OWHSJXA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
Yes, that's correct. Regarding not missing changes, what's a realistic scenario where you would miss changes with the WATCH api? I understand that there are theoretical cases where it can happen, but in practice it works extremely well and I'm yet to see a real world case where changes go missing.
All that has to happen for you to miss something is for the watcher to crashloop. Any event that occurs during that time would be missing, and there's no way to get it back.
Typically this isn't an actual problem for most kubernetes controllers, during runtime, the ideal is that it can react to changes that occur, but also, controllers should be, at startup, reaching out to find out the current state of objects in the cluster so it can act appropriately. A crashloop or even just a re-deployment may have little to no effect other than reaction time (which is solved with leadership election and multiple replicas).
But for something that cares not about current state, but changes over time, any downtime means potential data loss.
Yes, that's true. There are high availability solutions based on multiple instances, but also not 100%.
If you want something hermetic, you need the audit API or possibly an admission control based solution.
That covers the backend-side of it. You'll still need a convenient way to browse / view the changes and correlate with alerts.
If you're interested in doing something with Robusta, happy to discuss more in detail on our Slack or by videochat. We'd have to do some custom work here. You mentioned commercial tools above, so if you have a budget to sponsor the work it would help with prioritization.
The use case though is basically this repo, or kubewatch, or https://github.com/grafana/kubernetes-diff-logger, or Komodor.com
FWIW, I believe all these are using the WATCH API so the fundamental issue is the same.
It would be very nice to implement a hermetic solution in Robusta based on the audit API and to provide a definitive way of doing it.
Ya, they all use watch, and I'm not saying it's bad, just saying that I want something that has a higher reliability, and can report natively on information that is only available in the audit log. (There's no Audit API AFAIK).
What I'd like to do is maybe work on a Go library for this, which could not immediately be directly integrated into this, but maybe Robusta could implement either an enrichment API or use a configurable strategy pattern to decide to WATCH or use the audit log (sidecar?).
It sounds good to me. Happy to support this on the Robusta end.
We can fetch data by pull (e.g. with an enricher) or we can receive events by push (with a new trigger for audit events). Both work fine.