indexer icon indicating copy to clipboard operation
indexer copied to clipboard

Memory and CPU leak in agent and service

Open chriswessels opened this issue 4 years ago • 7 comments

Hey there,

I'd been running release sha-54d4905 on Kubernetes for a few days before upgrading to a new release, and noticed a huge drop in memory usage for the indexer-agent and indexer-serivce.

Looking back at memory usage for the deployments:

Screenshot 2020-09-01 at 22 23 40

Screenshot 2020-09-01 at 22 25 40

Both seem to have a fairly consistent and linear growth in memory usage of time.

Let me know if there are any further details I can provide!

Chris

chriswessels avatar Sep 01 '20 21:09 chriswessels

Looks like it's leaking CPU cycles too: Screenshot 2020-09-01 at 22 33 40 Screenshot 2020-09-01 at 22 33 48

chriswessels avatar Sep 01 '20 21:09 chriswessels

Looks like indexer-agent and indexer-service still have a memory leak problem. I've been using a Kubernetes CronJob to automatically restart them every day to keep memory usage under control. Here's a gist for anyone that wants to do the same: https://gist.github.com/chriswessels/8271f82a0ae7342d7d0822ea1e796246

chriswessels avatar Nov 21 '20 12:11 chriswessels

I'm also observing this issue on v0.18.6. However I've found a good way to accelerate it: send lots of GraphQL queries to the indexer-agent management endpoint. I've also noticed that the queries would take longer to execute as memory usage grows.

However I am not sure if certain types of queries are worse than others. Read below.

Context As @fordN and @chriswessels already know, I'm currently running experiments that require constantly updating the Agora models. In this particular case, I am updating the model variables for each subgraph every 3 minutes. In the plot below, every drop is the OOM killer killing the indexer-agent. The frequency increased around May 10th, when I deployed my experiment to the indexer. image

aasseman avatar May 19 '22 03:05 aasseman

Update: The problem is still present on v0.19.2.

aasseman avatar May 20 '22 18:05 aasseman

Update: immediately preceding the OOM crash, queries begin failing. This causes a drop in quality of service. out-of-memory-crash-grafana indexer-service-memory-leak

kaiwetlesen avatar Sep 21 '22 17:09 kaiwetlesen

Update: the problem is still present on v0.20.12

aasseman avatar Mar 01 '23 00:03 aasseman

Figment appears to be experiencing this issue too (shared with me on March 14, 2024)

alex-pakalniskis avatar Mar 14 '24 17:03 alex-pakalniskis