sentry-kubernetes
sentry-kubernetes copied to clipboard
Memory leak
sentry-kubernetes on our two clusters leaks memory, at least memory usage has only gone up during a few weeks of usage:
We have RBAC enabled and given sentry-kubernetes' service account the ClusterRole view.
As a workaround, we've now set the following resource requests/limits for the sentry-kubernetes container:
resources:
requests:
memory: 75Mi
cpu: 5m
limits:
memory: 100Mi
cpu: 30m
We expect these limits to restart the container every few days.
Interesting. I don't maintain any of my own global state, so it must be something inside of the kubernetes or raven-python clients, hmmm...
I observe the same problem with the pod getting OOM killed quite frequently.
Green is usage, light blue is limit. The cases on the graph were the cliff occurs before the limit are probably due to the pod being rescheduled during a cluster upgrade.
I am getting this as well. In our case it used 6GB of ram before getting force evicted by kubernetes
I ended up creating a lightweight reporter in golang that uses 7-10mb ram total and reports pod failures: https://github.com/stevelacy/go-sentry-kubernetes
I wonder if this is due to the use of breadcrumbs?
FWIW I made another alternative: https://github.com/wichert/k8s-sentry . That means there are now three alternatives:
- this version. Currently not really usable since it has this memory leak and isn't actively maintained.
- @stevelacy's go-sentry-kubernetes. A small Go reporter without memory leak. This monitors pods for changes and reports those to Sentry. Includes very little information in error messages.
- my k8s-sentry. Another small Go reporter. This uses the same approach as this project: it monitors events, and submits warning and errors events to Sentry. It includes a fair bit of information from events (object kind, namespace, component, event reason, event message, action taken, etc.). I might extend it to load fetch involved objects as well so it can add their labels and extra information as well (which would solve #13).
@wichert I like the way you parsed the events with the AddFunc rather than the UpdateFunc. I'll probably change mine in a similar way, coercing the description and reason didn't do as well as I hoped.
Edit: my go client now also provides detailed error information
@wichert Great to see some alternative, do you also provide a manifest or helm file?
@zifeo I will, once I have a first release and an image on docker hub.
@zifeo I forgot to mention, but I have example manifests now.
Hi, The agent has a completely different implementation now (rewritten in Go), so I'll close this one as outdated. Thanks everyone for the discussion and listing alternatives 👍