alloy icon indicating copy to clipboard operation
alloy copied to clipboard

Memory spikes every hour after reducing

Open maksimize opened this issue 2 years ago • 1 comments

Hi all

recently we had an issue with GrafanaAgent when the memory gradually increased and hit 100% and that led to the pod being OOM killed, after we increased the memory we have done few modifications to tweak the performance of the agent we have reduced the max wal time to become only 2 hours
maxWALTime: 2h and configured the remote_write to be

      maxSamplesPerSend: 1000
      maxShards: 200
      capacity: 5000

after having this new configuration running over the weekend we have noticed some spikes in the memory (10%) and in CPU usage(20%), I'm assuming that is happening due to discarding samples from the WAL

Please let me know if this is normal and if there is a better way to configure grafanaAgent.

Screenshot 2023-01-30 at 14 25 57 Screenshot 2023-01-30 at 14 25 49 Screenshot 2023-01-30 at 14 25 42 Screenshot 2023-01-30 at 14 25 23 Screenshot 2023-01-30 at 14 25 15 Screenshot 2023-01-30 at 14 25 03

maksimize avatar Jan 30 '23 13:01 maksimize

It sounds like you have a high series churn rate - brand new series with new labels are coming in faster than they can be deleted.

Try setting walTruncateFrequency: "15m" (next to where you'd set maxWALTime). This should help old series get deleted faster and keep your memory from increasing forever.

rfratto avatar Jan 30 '23 23:01 rfratto