self-hosted Redis Memory Limit Configuration

Problem Statement

Running self-hosted Sentry 23.6.1

The docker compose configuration sets up redis with unlimited memory (maxmemory = 0).

This causes problems after running instances for a few days; machine ingests many events (~100K), runs out of RAM (64GB), and redis then does a restart loop (if you have some userland OOM killer) or hangs the machine (if you don't).

The web container then fails to start, rightly complaining that ERROR: LOADING Redis is loading the dataset in memory.

This may be related to https://github.com/getsentry/self-hosted/issues/1787 and https://github.com/getsentry/self-hosted/issues/1796 - I did not investigate why Redis would use so much memory.

Simply reconfiguring redis after the fact does not work; it remembers the RAM it had available when the collection was created, and attempts to allocate that every time. One must wipe the whole redis database to fix this problem.

Solution Brainstorm

Work-around

To avoid re-creating the whole sentry instance, I followed the following steps:

configure redis with a max memory value that's not infinity (in my case, 10GB):

diff --git a/docker-compose.yml b/docker-compose.yml
index 38265d2..d4f9d6b 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -124,10 +124,16 @@ services:
       test: redis-cli ping
     volumes:
       - "sentry-redis:/data"
+      - "./redis.conf:/etc/redis/redis.conf"
     ulimits:
       nofile:
         soft: 10032
         hard: 10032
+    command:
+      [
+        "redis-server",
+        "/etc/redis/redis.conf",
+      ]
   postgres:
     <<: *restart_policy
     # Using the same postgres version as Sentry dev for consistency purposes
diff --git a/redis.conf b/redis.conf
new file mode 100644
index 0000000..2496632
--- /dev/null
+++ b/redis.conf
@@ -0,0 +1,2 @@
+maxmemory 10000M
+bind 0.0.0.0

wipe existing redis data: docker volume rm sentry-redis
re-run install script: ./install.sh

Note: from redis docs

It makes Redis return an out-of-memory error for write commands if and when it reaches the limit - which in turn may result in errors in the application but will not render the whole machine dead because of memory starvation.

Question: how bad will Sentry degrade if Redis refuses writes?

Proposed solution

I think it would be useful to configure memory limits for all the databases being used:

redis redis.conf maxmemory = 10G
postgres shared_buffers=8GB , effective_cache_size = 16GB
kafka -Xmx12g -Xms12g
zookeeper ZK_SERVER_HEAP = 2048
clickhouse max_server_memory_usage = 10000000000

I would also recommend that docker compose memory limits are used on all containers (e.g. 2GB for Python and 20GB for databases) to turn the OOM hangs into debuggable restart loops.

Edit: after resetting the redis and running the app for a few hours, the system is completely degraded - getting internal errors that don't show up in the internal project. I'll reinstall and try again.

Jul 11 '23 13:07 gabriel-v

Thanks for writing this up and attempting a solution to limit memory usage from redis. Memory limits for services in self-hosted are probably customizable based on the load your instance is receiving. It does look like you're ingesting a lot of events, so it's not super surprising that you're having issues as self-hosted is not built for scale.

Just curious, have you attempted to configure limits on the databases being used? Is it helping you? It's hard for us to decide on thresholds for memory limits since our own configuration doesn't bump into these issues

Jul 13 '23 17:07 hubertdeng123

you're ingesting a lot of events

Yes, mainly because of previously undiscovered spam of errors (restart loops, spammy logs, etc). We will make use of the tools provided (project rate limit, custom dynamic sampling) to lower volume. We really don't need this kind of volume.

have you attempted to configure limits on the databases being used? Is it helping you?

Yes - Redis and Clickhouse so far. Both are running perfectly with a 12 GB limit now, while previously Clickhouse would use 20+ GB, and Redis would eat up all ram. The other databases have acceptable memory usage, even with all the spam.

The main benefit of these limits is that, in the case of something going wrong and sending 500K events overnight, the Sentry instance does not completely crash. But the rate limits are the main protection against that - so we will use them.

It's hard for us to decide on thresholds for memory limits

I agree - maybe the installation script could compute a fraction of total ram for each service, to use as default values for the limits. It will complicate the installation script, that would be the trade-off.

Jul 14 '23 10:07 gabriel-v

I encountered roughly the same issue on version 23.8.0. With 75К events received over 24 hours, 7GB of RAM is depleted within 10 hours. I didn't face this issue on previous versions. I checked which keys are accumulating; mainly, they are as follows:

| name                                                                     |   count | type   | percent   |
|:-------------------------------------------------------------------------|--------:|:-------|:----------|
| c:*:*                                                                    |  542273 | string | 96.35%    |
| c:*:*:*:*                                                                |  542272 | string | 96.35%    |
| ts:*:*:*                                                                 |   11362 | hash   | 2.01%     |
| e:*:*                                                                    |    8391 | string | 1.49%     |
| sim:*:*:m:*                                                              |    3371 | set    | 0.59%     |

How long are data supposed to be stored there? I want to set up rotation but am not sure if it's needed and what function these keys serve. Thank you

Oct 19 '23 09:10 AndreyShmelz

Our 64GB ram instance has been running on these limits - you can probably adjust those to your instance size. We've been indexing 100k events/day every day since posting this issue with no problems

Oct 19 '23 11:10 gabriel-v

Oh, I was close but thank you, @gabriel-v, for the advice!

Oct 21 '23 15:10 AndreyShmelz

Hi everyone, I was having the same issue with high ingestion rates and redis using all my server's memory.

I had my docker-compose.yml file edited with the following line:

command: redis-server --maxmemory 10gb --maxmemory-policy allkeys-lru

I believe the proposed solution of having a redis conf file is possibly better, but regardless we need to be able to customize this and also having redis enabled to use all system memory is really bad. There should be a default configuration for eviction of keys.

May 05 '24 15:05 inoa-jboliveira

self-hosted self-hosted copied to clipboard

Redis Memory Limit Configuration

Problem Statement

Solution Brainstorm

Work-around

Proposed solution

self-hosted
self-hosted copied to clipboard