self-hosted icon indicating copy to clipboard operation
self-hosted copied to clipboard

Missing data on Performance even though it shows on stats

Open inoa-dmpassy opened this issue 8 months ago • 6 comments

We got a solution See @inoa-jboliveira post below

Self-Hosted Version

25.2.0

CPU Architecture

x86_64

Docker Version

28.0.1

Docker Compose Version

2.33.1

Machine Specification

  • [x] My system meets the minimum system requirements of Sentry

Steps to Reproduce

I'm running a Sentry Self Hosted, it was 24.11.1, yesterday we updated to 25.2.0 Since December we started seeing missing data on performance.

Look at the stats page

Image

However, when we check performance, there are times when there's a lack of transactions:

Image

for the last 14 days:

Image.

Maybe it has something to do with max-memory on redis, we added it around the same time. https://github.com/getsentry/self-hosted/pull/3427#issuecomment-2518128717

In the docker logs, there's a bunch

worker-1                                        | 14:34:45 [INFO] sentry.tasks.post_process: post_process.skipped (cache_key='e:d9ed0a1e03684bb4a7922078da6aeb6e:2' reason='missing_cache')

maybe this is correlated.

Yesterday I updated to 25.2.0 and update the redis.conf

# redis.conf

# The 'maxmemory' directive controls the maximum amount of memory Redis is allowed to use.
# Setting 'maxmemory 0' means there is no limit on memory usage, allowing Redis to use as much
# memory as the operating system allows. This is suitable for environments where memory
# constraints are not a concern.
#
# Alternatively, you can specify a limit, such as 'maxmemory 15gb', to restrict Redis to
# using a maximum of 15 gigabytes of memory.
#
# Example:
# maxmemory 0         # Unlimited memory usage
# maxmemory 15gb     # Limit memory usage to 15 GB

maxmemory 10gb

# This setting determines how Redis evicts keys when it reaches the memory limit.
# `allkeys-lru` evicts the least recently used keys from all keys stored in Redis,
# allowing frequently accessed data to remain in memory while older data is removed.
# That said we use `volatile-lru` as Redis is used both as a cache and processing
# queue in self-hosted Sentry.
# > The volatile-lru and volatile-random policies are mainly useful when you want to
# > use a single Redis instance for both caching and for a set of persistent keys.
# > However, you should consider running two separate Redis instances in a case like
# > this, if possible.

maxmemory-policy volatile-lru

previous was maxmeory 6gb.

But suddenly, redis is using 20mb

root@sentry:/sentry/self-hosted# docker exec -it sentry-self-hosted-redis-1 redis-cli info memory | grep used_memory
used_memory:22518360
used_memory_human:21.48M
used_memory_rss:56135680
used_memory_rss_human:53.54M
used_memory_peak:6455694696
used_memory_peak_human:6.01G
used_memory_peak_perc:0.35%
used_memory_overhead:8784551
used_memory_startup:811944
used_memory_dataset:13733809
used_memory_dataset_perc:63.27%
used_memory_lua:526336
used_memory_lua_human:514.00K
used_memory_scripts:31160
used_memory_scripts_human:30.43K

Expected Result

Perfonmance data available

Actual Result

docker-logs.txt

System info

root@sentry:/sentry/self-hosted# hostnamectl
   Static hostname: sentry
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 59910524c6845710a9cd9b0764281fd6
           Boot ID: 57ea4dbdc4c24a8eb8922e46d065e7b1
    Virtualization: kvm
  Operating System: Ubuntu 20.04.6 LTS
            Kernel: Linux 5.15.0-1075-gcp
      Architecture: x86-64

root@sentry:/sentry/self-hosted# df -h
Filesystem       Size  Used Avail Use% Mounted on
/dev/root         15G   12G  2.7G  82% /
devtmpfs          16G     0   16G   0% /dev
tmpfs             16G     0   16G   0% /dev/shm
tmpfs            3.1G  5.1M  3.1G   1% /run
tmpfs            5.0M     0  5.0M   0% /run/lock
tmpfs             16G     0   16G   0% /sys/fs/cgroup
/dev/nvme1n1     9.8G  8.2M  9.8G   1% /sentry
/dev/nvme0n1p15  105M  6.1M   99M   6% /boot/efi
/dev/nvme2n1     3.9T  1.9T  2.1T  48% /var/lib/docker
/dev/loop2        92M   92M     0 100% /snap/lxd/29619
/dev/loop1        56M   56M     0 100% /snap/core18/2846
/dev/loop3        45M   45M     0 100% /snap/snapd/23545
/dev/loop7        64M   64M     0 100% /snap/core20/2434
/dev/loop8        64M   64M     0 100% /snap/core20/2496
/dev/loop9        92M   92M     0 100% /snap/lxd/24061
/dev/loop10       56M   56M     0 100% /snap/core18/2855
/dev/loop0       409M  409M     0 100% /snap/google-cloud-cli/311
/dev/loop6        45M   45M     0 100% /snap/snapd/23771
/dev/loop4       409M  409M     0 100% /snap/google-cloud-cli/313
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/376702b4c2672bc9727afa8690aa7376bbc4cd7d2e7ceb2a0a3c07fadc136942/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/1090dbd3476847ad61efdb30498e85f8cbd6a0f2e302bee700aec35fe4c4a873/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/9c30e6837b275e6fcc0bdfc7c730db7f6821594d28bb150584a698ee183352b2/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/982b21fef7c0fce277edc38f14dfcd4afd7011f83f9ced0cd60c915fa5b21c16/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/0c27876c6903ffdb33277f136abd78e607c4d8e8890c3ea1f7d4378497bf6c15/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/094dd1ec006220e58aad4a16424f43c52d5e5a111633cb005c4be7fc9b459178/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/13e3b9524c52a2f2dd9941be7a8952e18d4e6273f198fffe0772f12a27920642/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/c09e2c8ffe38621097723dd9281429ab3360d32dd2e37755c4c70c5ba20db8db/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/ea40de3f8d977dbb8843fbf76e9fa27b629a80e7e30260aae02bd2a9fd2e1536/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/4216d4169a27c0287105304d43967354898ab2b21931d22bc6ddcee03bb286ae/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/cb90d77e1d8231085937561159f156a396f3d1269317d113d42e74a1f65c8794/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/879eba6ed81f80d7d72a0325c13e014e2b43d97b55ec39b450c94a7d37135ad8/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/9c43ca486f232d6df31d04d0f78accbc94678b056e87581162be49dd478226ac/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/0adad4ca5abe57daf16dc37f25ab60a7dce56f8c39ad6858d85c5f1e134986dc/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/4ad66c4361baef5d41947c0945cc6a1af75ee92d0f096939797d4f0c6d659297/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/d7299eefe6f77c752238b549c352bf9c28b76b713ecc47bfd7b932287137152b/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/5c434e00084f0eb2c7c42a9733a789ca08f4f6140973754c4c7991f5806863d7/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/ea53ec55d077c99e62b01161815ccd0850dae526865f4b3967f3e6f456257e7b/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/d27e4748a630eba713a72d8295dccfc658c3e7b38ee29ca17ad49c681445eb20/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/79f4679fa70f2f364cf0caa736b517953e2bb878bb60fef752fed2892a57d867/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/7432eccbc03a742ed5f3c868501918fb5b1e8764b04cb9c33f9ad7363e2f87b9/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/2474dda25fb56638599e86507f7f93fe9938d71ea98136180cb2c04d8c93177c/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/7861ac54f95e7d009d90f335a9d4ae5dac4bc5a05b7a8995a03364737d7ad2b1/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/b2809c9f626910fdac6f5dbfa3f50c5fa2bd6d4df2235033582c4cf84aca0329/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/8ae7a9d3e593cadf829d4aa68f5d613602c7f7c6c29c26e301deb71da16a6c10/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/eaa67fb42fe530d6d5212f0703cbd030283c9227fc0edeafacc4bd58ce150a2b/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/802a1d17295be6a6737396d6e607962dd7f5bb27bc4b202e1afa55dcd440d0e2/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/6c0dc77cdc0f687274040bfa6219ccc4b910764f642d51ea77a4bbf544beee9d/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/1042a2fd2a006077335c0ef7d2dcab9f734f42451f2c88c973d33ce0c324ee60/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/699c80d8012bde9c32f95ab4b6bdabaf542989de0e2c166e69326483e02115c0/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/33cf1d634857b02de874f7cd922e738ff42f62996050f6e95aa75e37ee82f6e8/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/cc3664feb14bfc37ded3dc9dc1ad5224211eb6ea51074bcd127977e865f410b5/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/de7f1b2f8e9bf6aec654ba15610468b4094bf22e3e1e56e01463569ffb06cc11/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/e77df70c76d750882111effe71b4586fdf45eb50bbbf0cfb4ab7cc9d420538bd/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/9cd837aa5e391454f37115be9e40d74264b978f8b08427a8e62ccb435f6a93f3/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/07badb74c4f635003d264899f0d8a63da80dae36b89b68db4f9bc43fa463b46a/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/32bd34e02c0b859d4842b3616be64d5a3a49d3bc2e611003965d790510d4dc4a/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/94464b789c704ab47a57b718d072add8e4709f715fc28a19a75281581cea3594/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/148d9ac3490cb6878e440c85b85f49eec75f676523960d95a6ece715d64a63a3/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/baf1681d0008fec1577652d61d91573a85ee1f9fd0dc89d68db7bfa9966f7e0c/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/4b29c477b4b146b974489865c8d7d41349c84395e34f71ddbf57946f96cac6b6/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/cf919e0daf06c439c7a524463996c867f8108755d1fa33686f0f13a6fa00fce6/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/5de74aa5f4cdd0140053d08bdbaf7fde5a6bc0095d14fee363c31e35ee9da72a/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/266d7735c54aa9dafa2d8711c454c3bda698b29b6961af112d1f0c2d8e5ed65d/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/b2bffc4daafa83e5c0a543e56404c02de898880d35d80a4306473eb484b308e2/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/0f962d97a34f63b7681a024b4768c14f8e2061e263bdefedd2ebe8db4bb1947d/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/5bcdddcbad35facfc1aac4720e1bc040ad9f74e0376b9e373e697c5267cd2911/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/9c667f35b8ab184c54f69d92dceb3fb839465a8fa184950d8b00f96d62e1ff2a/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/6f721ac5ae5ad38bd75b83baac2c548b7a81d707798ab82e910ba634868d98ef/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/68a7dcec67f096f6aee7407d91bd262329a504aa4bed302785ebd541cc72bdf9/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/a773b99964f38d2590917eb07c7c18c8566e903c7e26c3dc14c2cc19e8caa822/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/d56122bcd7558dd660b0f4f7c5a605314f91941cc20dd2390cf7f742db0b4133/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/e4bdcbd5e42e777a0bbd198791f802b3f6eab0fa24e7c6f9399010608e1a96b7/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/cf687f917ab876869442f2aef2fba3a99986761a06526adfaf1c68c91f92acf1/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/34acb971190338a2af516299e498f8ca65670c86e7079830f9da514e4428ec27/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/6b5e0683f75b9896ff141a106e7ca51e0216df17d60154db914e129a74806afd/merged
overlay          3.9T  1.9T  2.1T  48% /var/lib/docker/overlay2/13a899ad0bc9aa7ddffe7eea583b8d6d2ba4a4e6d4e3db46f376322005f2f41f/merged
tmpfs            3.1G     0  3.1G   0% /run/user/0

Event ID

No response

inoa-dmpassy avatar Mar 07 '25 15:03 inoa-dmpassy

Even after reseting all volumes, in the first 10s I get:

worker-1                                        | 22:07:53 [INFO] sentry.tasks.post_process: post_process.skipped (cache_key='e:6147691fe92446cd8558559bfc699b3c:2' reason='missing_cache')
worker-1                                        | 22:07:53 [INFO] sentry.tasks.post_process: post_process.skipped (cache_key='e:fb663648e9ec4f8385b60ce6cbf59f75:2' reason='missing_cache')
worker-1                                        | 22:07:53 [INFO] sentry.tasks.post_process: post_process.skipped (cache_key='e:247775f80b7448f7b971406a387069b6:2' reason='missing_cache')

Tons of those.

inoa-dmpassy avatar Mar 10 '25 22:03 inoa-dmpassy

Hi everyone,

the way we solved the issue was by manually changing the snuba consumer for transaction to have more threads/workers with --concurrency

docker-compose.yml

snuba-transactions-consumer:
    <<: *snuba_defaults
    command: rust-consumer --storage transactions --consumer-group transactions_group --auto-offset-reset=latest --max-batch-time-ms 750 --no-strict-offset-reset --concurrency 4

It seems that at some point in time we were "upgraded" from consumer to rust-consumer and this new consumer cannot keep up with our demand unless concurrency is increased.

Is there a way to configure this permanently? We don't want to lose the change on the next update.

inoa-jboliveira avatar Apr 02 '25 15:04 inoa-jboliveira

It does seem like consumer can process transactions at a more stable rate than rust-consumer. Though I am unsure if this is always the case or just when the system has fallen behind for any reason.

I've also added secondary consumers for a few groups as a backup, just to reduce the chance that a crash/overload stalls things (for too long).

TaaviE avatar May 27 '25 15:05 TaaviE

I'll backlog this issue on "the list of things that needs to be documented".

Summary: If the observed Kafka consumer groups has any lags, and only if the amount of message lag is big, then a second consumer should be spawned. Either through spawning another container manually or through services > [container name] > deploy > replica (https://docs.docker.com/reference/compose-file/deploy/#replicas)

aldy505 avatar Jun 05 '25 03:06 aldy505

Hi @aldy505

But how can we make this a permanent change? And what is the advantage of a 2nd replica if the application provides workers?

My issue is having to keep redoing this change on every update

inoa-jboliveira avatar Jun 05 '25 20:06 inoa-jboliveira

Hi @aldy505

But how can we make this a permanent change? And what is the advantage of a 2nd replica if the application provides workers?

My issue is having to keep redoing this change on every update

@inoa-jboliveira Simple, use Git! Commit your changes on a separate branch. Everytime you do upgrades, just do git fetch and git merge 25.6.0 (version as an example). If they are conflicts, all you need to do is resolve them.

The second replica would help for your installation to be able to achieve more throughput. That's all.

aldy505 avatar Jun 05 '25 22:06 aldy505

I updated from 25.5.1 to 25.10.0.

On both I had the same: taskworker-1 | 07:09:29 [INFO] sentry.tasks.post_process: post_process.skipped (cache_key='e:f0f85672f5544b0faecd6b3cb60dec21:56' reason='missing_cache')

VM with 32 cores and 64gb of ram and 'maxmemory 16gb' (redis). Other is left default.

Also 'sentry queues' command is missing in 25.10.0 - on purpose?

klemen-df avatar Oct 21 '25 07:10 klemen-df