self-hosted icon indicating copy to clipboard operation
self-hosted copied to clipboard

Limit Redis memory usage of system RAM

Open kixorz opened this issue 4 months ago • 24 comments

Limit Redis memory usage to 25% of system RAM and enable eviction policy to prevent OOM.

Summary

This PR configures Redis in our Docker Compose deployment to use a bounded amount of memory (25% of the container’s available RAM) and enables an eviction policy. This reduces the risk of Redis consuming all available memory, triggering out-of-memory (OOM) conditions, and forcing costly instance size increases in cloud environments.

What changed

  • Updated the Redis service command to compute a maxmemory value at container start and pass it to redis-server.
  • Set the eviction policy to volatile-lru so Redis evicts the least recently used keys that have an expiration set when under memory pressure.

Why this is needed

  • Unlimited Redis memory leads to unbounded growth: Redis holds data in memory; without a cap, it can grow until the host/container runs out of memory.
  • OOM crashes and instability: When Redis (or the container) exhausts memory, the kernel may OOM-kill processes, causing downtime and data loss (in-memory datasets) or cascading failures in dependent services.
  • Cloud cost pitfalls: The usual band-aid for OOM is to permanently bump instance sizes (more RAM). That’s expensive and scales poorly as workload grows. Setting a sane cap plus an eviction policy keeps memory predictable and avoids unnecessary instance class upgrades.

How it works

  • On container startup, we read total memory from /proc/meminfo and compute 25% for --maxmemory.
  • We set --maxmemory-policy volatile-lru to evict the least recently used keys among those with TTLs when approaching the memory cap.

Impact

  • Predictable memory footprint for Redis within the container.
  • Reduced risk of host-level OOM and improved overall stability.
  • Some non-expiring keys won’t be evicted by volatile-lru. If the dataset is dominated by non-expiring keys, you may want a different policy (see below).

Configuration and overrides

  • Default behavior: 25% of container RAM, eviction policy volatile-lru.
  • To change the allocation fraction: edit the compose command expression (e.g., use /3 or /2 instead of /4).
  • To change eviction policy: replace --maxmemory-policy volatile-lru with one of Redis’s supported policies (e.g., allkeys-lru, volatile-ttl, allkeys-random, noeviction, etc.).
  • If you prefer an explicit fixed cap, replace the computed value with a static byte value, for example: --maxmemory 2gb.

Risks and trade-offs

  • Evictions under memory pressure: If Redis reaches the cap, keys will be evicted per the chosen policy. Applications relying on retained cache entries should tolerate misses.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

kixorz avatar Aug 26 '25 19:08 kixorz

Applications relying on retained cache entries should tolerate misses.

Are we sure this is true for self-hosted?

Also MemTotal / 4 in our minimum requirement setup which is 16GB RAM + 16GB Swap would be 4GB. So we have to be sure redis does not use more than that.

I checked two self-hosted instances and one of them is using 20MB and the other is using 40MB :)

aminvakil avatar Aug 26 '25 20:08 aminvakil

@kixorz Which version is self-hosted are you on? I remember we had a memory leak problem with redis about two or three years ago, but it got fine after a couple of releases, unfortunately I cannot remember the exact version.

And I was wondering why your redis instance got OOMed?

aminvakil avatar Aug 26 '25 20:08 aminvakil

Hey, thanks for the question. We ran out of disk space, Sentry stopped working. When we rebooted, redis allocated all available memory on the system and the machine hung. We had to double the RAM to make it work again. After it started to work, we modified the RAM back. Redis exhausted the memory again when it was trying to start back up. This is because its disk database is already as big as the highest RAM value from the previous start.

kixorz avatar Aug 26 '25 20:08 kixorz

Also copying this from original PR which created redis.conf:

The default value of the maxmemory setting in Redis depends on the system architecture:

64-bit Systems: By default, there is no limit on memory usage. This allows Redis to utilize as much RAM as the operating system permits until it runs out of available memory.
32-bit Systems: The implicit memory limit is typically set to 3GB, which is a consequence of the limitations inherent in 32-bit addressing.

I believe we can set the default value to unlimited, allowing users to adjust this setting as needed based on their specific requirements.

Originally posted by @Hassanzadeh-sd in https://github.com/getsentry/self-hosted/pull/3427#discussion_r1844945851

aminvakil avatar Aug 28 '25 20:08 aminvakil

The problem is that with maxmemory 0 there is no cap on redis memory use. There are situations where redis will use all available memory - for example after the system runs out of disk space. This hangs the system.

If the solution is to modify the conf file, feel free to close this. I think having the two params in the compose file is more elegant.

kixorz avatar Aug 28 '25 23:08 kixorz

The problem is that with maxmemory 0 there is no cap on redis memory use. There are situations where redis will use all available memory - for example after the system runs out of disk space. This hangs the system.

It's fine to set an appropriate limit on redis maxmemory if we know how it uses over time.

If the solution is to modify the conf file, feel free to close this. I think having the two params in the compose file is more elegant.

We can change this PR to set the limit in redis.conf if that's OK with you. Let's see what maintainers think about this before you put work onto it though.

aminvakil avatar Aug 29 '25 11:08 aminvakil

👎🏻 on this. We should just let people change the redis config.

@kixorz if you feel strongly about the env variable you can also use docker.compose.override.yml file to override the file per your changes and use an env variable in your own installation.

BYK avatar Sep 02 '25 12:09 BYK

👎🏻 on this. We should just let people change the redis config.

@kixorz if you feel strongly about the env variable you can also use docker.compose.override.yml file to override the file per your changes and use an env variable in your own installation.

@BYK How do you feel about a default redis maxmemory instead of 0?

aminvakil avatar Sep 02 '25 12:09 aminvakil

@BYK How do you feel about a default redis maxmemory instead of 0?

Don't have enough ops expertise to make an intelligent comment about this 😅

BYK avatar Sep 02 '25 12:09 BYK

Thanks for the feedback. I updated the improvement to limit redis memory to 2GB using the config file.

Please let me know if you have other suggestions.

kixorz avatar Sep 07 '25 02:09 kixorz

Thanks for the feedback. I updated the improvement to limit redis memory to 2GB using the config file.

Please let me know if you have other suggestions.

I like this change, else we could also limit the docker container max memory I have

    mem_reservation: 512M
    mem_limit: 8G

williamdes avatar Sep 19 '25 15:09 williamdes

We need to make sure this suffices self-hosted usage for different types of usage.

Quoting myself on this, unless we know the maximum redis memory usage of self-hosted in different scenarios, you can change it locally for yourself.

aminvakil avatar Sep 19 '25 16:09 aminvakil

0 (unlimited) is not the right default. It leads to problems and system crashes.

kixorz avatar Sep 19 '25 18:09 kixorz

@williamdes I updated the docker compose file with the suggested limits.

Please let me know any other feedback.

kixorz avatar Sep 29 '25 17:09 kixorz

I updated the change based on your feedback. Redis is configured to use up to 2GB, the container limit is 2.5GB.

kixorz avatar Oct 01 '25 16:10 kixorz

I checked two self-hosted instances and one of them is using 20MB and the other is using 40MB :)

Correcting myself, I just saw this in another self-hosted instance:

used_memory_human:22.68G

So I guess this differs from instance to instance?

aminvakil avatar Oct 09 '25 10:10 aminvakil

So I guess this differs from instance to instance?

Yep, I've set two instances with 20MB and 40MB instances redis maxmemory to 4gb manually, and the other one with 22gb redis usage, did not have a maxmemory set.

aminvakil avatar Oct 09 '25 11:10 aminvakil

Redis is used for a lot of different use cases across sentry. I am sure that some of those will break when keys disappear sooner than expected. If any of those use cases causes infinite memory growth then that is a bug that we should investigate independently. @kixorz do you have any clue what type of redis key (e.g. what prefix) causes the memory pressure in your case?

jjbayer avatar Oct 09 '25 11:10 jjbayer

Redis is used for a lot of different use cases across sentry. I am sure that some of those will break when keys disappear sooner than expected. If any of those use cases causes infinite memory growth then that is a bug that we should investigate independently. @kixorz do you have any clue what type of redis key (e.g. what prefix) causes the memory pressure in your case?

Just to repeat - The issue is that uncapped redis will use all available system memory. It becomes a problem when it's restarted and attempts to load the save file, which will cause a crash loop.

I'd say the limit is the first step to a solution.

How can I find out what prefix is causing it?

kixorz avatar Oct 09 '25 12:10 kixorz

The issue is that uncapped redis will use all available system memory. It becomes a problem when it's restarted and attempts to load the save file, which will cause a crash loop.

I'd say the limit is the first step to a solution.

Limiting redis memory usage for your own sentry instance makes sense, but setting a default now breaks every instance that already uses > 2 GB today.

How can I find out what prefix is causing it?

Not sure, I would start by running something like redis-cli --scan | cut -d: -f1 | sort | uniq -c | sort -r to find the dominant prefixes. That does not necessarily tell you that they use a lot of memory but it might be a place to start.

jjbayer avatar Oct 14 '25 08:10 jjbayer

@jjbayer I've removed the redis memory limit on a self-hosted instance and it keeps on getting OOM killed, then I put a 20GB memory limit to see what is taking much:

$ docker compose exec redis redis-cli --scan | cut -d: -f1 | sort | uniq -c | sort -n

... lots of keys repeated 1 and then:

      2 rl
      4 pc
     52 tw
    128 scheduler_process
    147 c
    195 b
   1953 sentry.monitors.volume_history
   3005 ts
 417518 e

Is this normal? If this is a bug, please tell me to open an issue regarding this.

Edit: After 14 hours:

      3 pc
     48 tw
    128 scheduler_process
    243 c
    263 sentry.monitors.volume_history
    977 b
   3121 ts
 614010 e

aminvakil avatar Oct 18 '25 07:10 aminvakil

@aminvakil thanks for the input!

614010 e

Looks like you have ~600k events and/or attachments in the processing cache, which has a TTL of one hour. But these keys should typically not stick around for an entire hour, for error event data they should be deleted by the docker container post-process-forwarder-errors. Could you check if this container is running?

https://github.com/getsentry/sentry/blob/7d1e9a6f23ae204ed98084c1c6d3dac12d3f4d68/src/sentry/utils/cache.py#L14

https://github.com/getsentry/sentry/blob/7d1e9a6f23ae204ed98084c1c6d3dac12d3f4d68/src/sentry/ingest/consumer/processors.py#L37

jjbayer avatar Oct 21 '25 08:10 jjbayer

@jjbayer Those keys stick around and likely cause this issue.

If those keys are being deleted by another docker container, it may not work as intended when Redis doesn't start due to a OOM.

kixorz avatar Oct 21 '25 21:10 kixorz

@aminvakil thanks for the input!

614010 e

Looks like you have ~600k events and/or attachments in the processing cache, which has a TTL of one hour. But these keys should typically not stick around for an entire hour, for error event data they should be deleted by the docker container post-process-forwarder-errors. Could you check if this container is running?

Yes, it's running.

https://github.com/getsentry/sentry/blob/7d1e9a6f23ae204ed98084c1c6d3dac12d3f4d68/src/sentry/utils/cache.py#L14

https://github.com/getsentry/sentry/blob/7d1e9a6f23ae204ed98084c1c6d3dac12d3f4d68/src/sentry/ingest/consumer/processors.py#L37

I think it's because of slow disks and high load of this instance, they cannot be persisted to database, therefore backlog of redis gets filled.

Setting a maxmemory is fine for our instance and we are fine with losing a couple of events or their details until hardware gets improved.

Redis is used for a lot of different use cases across sentry. I am sure that some of those will break when keys disappear sooner than expected. If any of those use cases causes infinite memory growth then that is a bug that we should investigate independently. @kixorz do you have any clue what type of redis key (e.g. what prefix) causes the memory pressure in your case?

But you're right I do not think this is a good default to be included in self-hosted.

aminvakil avatar Oct 22 '25 10:10 aminvakil