alloy
alloy copied to clipboard
Memory spikes every hour after reducing
Hi all
recently we had an issue with GrafanaAgent when the memory gradually increased and hit 100% and that led to the pod being OOM killed, after we increased the memory we have done few modifications to tweak the performance of the agent
we have reduced the max wal time to become only 2 hours
maxWALTime: 2h
and configured the remote_write to be
maxSamplesPerSend: 1000
maxShards: 200
capacity: 5000
after having this new configuration running over the weekend we have noticed some spikes in the memory (10%) and in CPU usage(20%), I'm assuming that is happening due to discarding samples from the WAL
Please let me know if this is normal and if there is a better way to configure grafanaAgent.
It sounds like you have a high series churn rate - brand new series with new labels are coming in faster than they can be deleted.
Try setting walTruncateFrequency: "15m"
(next to where you'd set maxWALTime). This should help old series get deleted faster and keep your memory from increasing forever.