alloy Memory spikes every hour after reducing

Memory spikes every hour after reducing

Open maksimize opened this issue 2 years ago • 1 comments

Hi all

recently we had an issue with GrafanaAgent when the memory gradually increased and hit 100% and that led to the pod being OOM killed, after we increased the memory we have done few modifications to tweak the performance of the agent we have reduced the max wal time to become only 2 hours
maxWALTime: 2h and configured the remote_write to be

      maxSamplesPerSend: 1000
      maxShards: 200
      capacity: 5000

after having this new configuration running over the weekend we have noticed some spikes in the memory (10%) and in CPU usage(20%), I'm assuming that is happening due to discarding samples from the WAL

Please let me know if this is normal and if there is a better way to configure grafanaAgent.

Screenshot 2023-01-30 at 14 25 57

Jan 30 '23 13:01 maksimize

It sounds like you have a high series churn rate - brand new series with new labels are coming in faster than they can be deleted.

Try setting walTruncateFrequency: "15m" (next to where you'd set maxWALTime). This should help old series get deleted faster and keep your memory from increasing forever.

Jan 30 '23 23:01 rfratto

alloy alloy copied to clipboard

Memory spikes every hour after reducing

alloy
alloy copied to clipboard