self-hosted Disk Space getting full

Self-Hosted Version

self-hosted-22.6.0

CPU Architecture

X64

Docker Version

self-hosted-22.6.0

Docker Compose Version

self-hosted-22.6.0

Steps to Reproduce

Hello there,

I am using sentry for very 1st time. So not sure about how internal of sentry works.

We are running self hosted sentry on one of the VMs. The disk of VM is getting full every now and then causing is space issues. I spoke with developers and they said they don't want any events more than 5 days. So after bit of googling, I realised that there is setting in .env file to retain the number of events for X days. I have set it to 10.

dukaan@sentry:~/self-hosted-22.6.0$ cat .env 
COMPOSE_PROJECT_NAME=sentry-self-hosted
SENTRY_EVENT_RETENTION_DAYS=10
# You can either use a port number or an IP:PORT combo for SENTRY_BIND
# See https://docs.docker.com/compose/compose-file/#ports for more
SENTRY_BIND=9000
# Set SENTRY_MAIL_HOST to a valid FQDN (host/domain name) to be able to send emails!
# SENTRY_MAIL_HOST=example.com
SENTRY_IMAGE=getsentry/sentry:22.6.0
SNUBA_IMAGE=getsentry/snuba:22.6.0
RELAY_IMAGE=getsentry/relay:22.6.0
SYMBOLICATOR_IMAGE=getsentry/symbolicator:0.5.1
WAL2JSON_VERSION=latest
HEALTHCHECK_INTERVAL=30s
HEALTHCHECK_TIMEOUT=60s
HEALTHCHECK_RETRIES=5

After this I restarted the VM (as I was not sure which service (container here) to restart. But disk space is still getting filled. I have tried running docker system prune. did not help.

SENTRY_EVENT_RETENTION_DAYS - does this help in reclaiming the space?

What are my options here? How Can I reclaim the ever growing space?

Here is some space consumtion summary. Looks like sentry-postgres container volume is consuming most space.

root@sentry:/var/lib/docker# sudo du -smh * | sort -rh
893G	volumes
155G	containers
21G	overlay2
12M	image


root@sentry:/var/lib/docker/volumes# sudo du -smh * | sort -rh
771G	sentry-postgres
90G	sentry-kafka
32G	sentry-clickhouse

Expected Result

NA

Actual Result

NA

Sep 07 '22 05:09 Cloud-Mak

The same thing happened with me today, I end up running ./reset.sh which actually deleted all the volumes , hence installed sentry again from scratch,can any one help how can we setup cron which will clean the data from postgres, and kafka, I added SENTRY_EVENT_RETENTION_DAYS = 10 , after new installation, will it clean now it self?

Sep 07 '22 08:09 sharjeelz

@chadwhitacre @BYK I need help from you guys

Sep 07 '22 09:09 sharjeelz

@sharjeelz

I did below inside worker container. It it stuck there for last few hours.

root@6868e8f50bf8:/# date && sentry cleanup --days 8 && date
Wed Sep  7 08:14:32 UTC 2022
/usr/local/lib/python3.8/site-packages/sentry/runner/initializer.py:555: DeprecatedSettingWarning: The SENTRY_URL_PREFIX setting is deprecated. Please use SENTRY_OPTIONS['system.url-prefix'] instead.
  warnings.warn(DeprecatedSettingWarning(old, "SENTRY_OPTIONS['%s']" % new))
08:14:34 [INFO] sentry.plugins.github: apps-not-configured
Removing expired values for LostPasswordHash
Removing expired values for OrganizationMember
Removing expired values for ApiGrant
Removing expired values for ApiToken
Removing expired files associated with ExportedData
Removing old NodeStore values

Also, I tried deleting the data in DB. This step was done before above command.

dukaan@sentry:~$ sudo docker exec -it sentry-self-hosted_postgres_1 bash
root@1f3811799b71:/# psql -U postgres
psql (9.6.24)
Type "help" for help.

postgres=# SELECT oid::regclass, reltoastrelid::regclass, pg_relation_size(reltoastrelid) AS toast_size FROM pg_class WHERE relkind = 'r' AND reltoastrelid <> 0 ORDER BY 3 DESC;
                    oid                     |      reltoastrelid      |  toast_size  
--------------------------------------------+-------------------------+--------------
 nodestore_node                             | pg_toast.pg_toast_20250 | 561639555072      (561 GB)
 pg_rewrite                                 | pg_toast.pg_toast_2618  |       385024
 pg_statistic                               | pg_toast.pg_toast_2619  |       327680
 sentry_activity                            | pg_toast.pg_toast_16494 |       155648


postgres=# DELETE FROM nodestore_node WHERE timestamp < '2022-07-25 00:00:00';
DELETE 65663623
postgres=# SELECT oid::regclass, reltoastrelid::regclass, pg_relation_size(reltoastrelid) AS toast_size FROM pg_class WHERE relkind = 'r' AND reltoastrelid <> 0 ORDER BY 3 DESC;
                    oid                     |      reltoastrelid      |  toast_size  
--------------------------------------------+-------------------------+--------------
 nodestore_node                             | pg_toast.pg_toast_20250 | 562259468288		(562 GB)
 pg_rewrite                                 | pg_toast.pg_toast_2618  |       385024
 pg_statistic                               | pg_toast.pg_toast_2619  |       327680

Kind of strage - but even after deleting the table size actally increased a bit.

Got reference from here for above steps. I did not truncate the table nodestore_node. Also, looks like doing VACUUM FULL on DB might just stop any writes into DB. So not sure vaccume will help

Try these steps and let me know if you have any luck. Sentry devs should seriously consider giving some solution here.

Sep 07 '22 11:09 Cloud-Mak

@sharjeelz Please don't tag individual people, it's rude. If you and @Cloud-Mak want to demand that Sentry devs manage Sentry for you then we have a paid offering for that. Otherwise we are happy to help you troubleshoot but ultimately the burden is on you and is not a source of urgency for us. That said, have you reviewed our troubleshooting documentation?

Sep 07 '22 14:09 chadwhitacre

@Cloud-Mak Did you check Stats panel in UI? How much events per minute did you receive? Something is wrong here - our Sentry postgres instance with 3k TPM and 90days retention takes only 800 GB.

Sep 07 '22 15:09 eugenberend

@eugenberend

Screenshot 2022-09-07 at 10 13 53 PM

This is summary of last 30 days. Plz excuse image.....i am on handheld & one of our engineer sent me this.

Sep 07 '22 16:09 Cloud-Mak

@chadwhitacre sorry i did not meant to mention you for this reason, and yes I did the trouble shooting but it did not say much about how cleanup works, and how can we trigger it manually

Sep 08 '22 00:09 sharjeelz

Seeing a massive disk usage spike today/yesterday which maxed out our disk. There's no apparent spike in ingestion. We upgraded to 22.7.0 a week ago if that anything. Troubleshooting doc isn't very helpful wrt disk usage, or causes of massive spikes in disk usage. We've had sentry self hosted running for about 6 months now and not seen disk usage trends like this before.

Sep 08 '22 07:09 matthewbyrne

/var/lib/docker/volumes was causing most of our usage. Running a docker system prune -a -f and docker volume prune to attempt to free up some space but only freed up 1GB. This suggests those volumes are in use by at least 1 container. Postgres VACUUM got us about 10GB back.

sentry-data volume accounts for about 50% of our disk. sentry-kafka volume accounts for about 25% of our disk. sentry-postgres volume accounts for about 15% of our disk. No way of telling which one(s) have been increasing, but I suspect sentry-data given it's overall size

Sep 08 '22 10:09 matthewbyrne

Hey @matthewbyrne, the spike in usage indeed does not look normal. I don't think sentry-data should be taking up that much space. Would you be able to take a look at the contents of sentry-data? If you're using Docker Desktop it should be pretty straightforward by going to Volumes -> sentry-data -> Data

Sep 08 '22 17:09 hubertdeng123

@hubertdeng123 , not running docker desktop on this machine. For context, 50% disk is ~100GB Contents of sentry-data is just a whole of of this: files/23/13e8/6561354704a33d7a8c429b8374

Nothing I can really draw any conclusions from.

I did run a find /var/lib/docker/volumes/sentry-data/_data/files/ -mindepth 1 -mtime +90 -delete and regained 30GB of disk. (our retention is set to 90d)

Sep 08 '22 19:09 matthewbyrne

It looks like some GC ran (better late than never), and we got back another 15% disk. I ran a command to check the disk usage of the directories under /var/lib/docker/volumes/sentry-data/_data/files/, modified yesterday and I got only a bit over 1.5GB (maybe GC had a role in this). I ran the same command for today, and got ~65GB

du /var/lib/docker/volumes/sentry-data/_data/files/ --time --max-depth=1 | grep 2022-09-08 | awk '{ total += $1; print }; END { print "Total (KB): " total }' ^^ Careful though, the /var/lib/docker/volumes/sentry-data/_data/files/ directory is included in that, so you'll have to manually subtract that from the total. So today, 65GB of sentry-data files were modified

Sep 08 '22 20:09 matthewbyrne

@matthewbyrne Got it, those look like files that docker is potentially creating. I wonder if that's what is also causing disk space to fill up for other users as well.

Is the behavior of your self hosted instance normal after deleting the files in sentry-data? Those files may need to be cleaned up every now and then

Sep 08 '22 21:09 hubertdeng123

I believe the data in that directory isn't "docker created" as you suggest. Docker volumes are stored in a part of the host filesystem which is managed by docker (under /var/lib/docker/volumes/). When a volume is mounted to a container, that directory is mounted to the container. Ie. What I see in that directory is exactly what sentry web sees in /data directory.

After deleting the over 90 day old files we didn't notice any behaviour change in our sentry instance. All was normal, though it's hard to determine what those files are for. That volume is used in a number of sentry service docker containers, mounted to /data in the web container for one (IIRC - it's late here and I'm on my phone).

Sep 08 '22 22:09 matthewbyrne

Hm yeah the volume is used by 8 other containers, I was mistaken by saying "docker created".

sentry-cleanup
web
cron
worker
ingest-consumer
post-process-forwarder
subscription-consumer-events
subscription-consumer-transactions

I looked into the getsentry/sentry repo and it turns out these files are uploaded media. https://github.com/getsentry/sentry/blob/master/docker/config.yml#L38

This config is used on this line: https://github.com/getsentry/sentry/blob/master/src/sentry/models/file.py#L98

Sep 08 '22 23:09 hubertdeng123

Wondering if those are file uploads 🤔

Sep 08 '22 23:09 hubertdeng123

Sentry "Stats" list no "transactions", and no "attachments". Just our typical "errors" intake. Though that's not entirely true, because I stumbled upon some (very few) transactions via the "Discover" view. Artifacts (if any) are tiny ~30mb).

Is there any way to get a breakdown of what disk usage each project is responsible for?

Sep 09 '22 07:09 matthewbyrne

Unfortunately I'm not sure if a breakdown of disk usage for each project exists

Sep 09 '22 18:09 hubertdeng123

@sharjeelz Please don't tag individual people, it's rude. If you and @Cloud-Mak want to demand that Sentry devs manage Sentry for you then we have a paid offering for that. Otherwise we are happy to help you troubleshoot but ultimately the burden is on you and is not a source of urgency for us. That said, have you reviewed our troubleshooting documentation?

Hi Chad/All, I did run the cleanup script mentioned here - https://develop.sentry.dev/self-hosted/troubleshooting/#postgres

:~/self-hosted-22.6.0$ sudo docker-compose run -T web cleanup --days 7 -m nodestore -l debug
Creating sentry-self-hosted_web_run ... done
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
11:41:33 [INFO] sentry.plugins.github: apps-not-configured
11:41:34 [DEBUG] sentry.digests: Validating Redis version...
Removing expired values for LostPasswordHash
>> Skipping LostPasswordHash
Removing expired values for OrganizationMember
>> Skipping OrganizationMember
Removing expired values for ApiGrant
>> Skipping ApiGrant
Removing expired values for ApiToken
>> Skipping ApiToken
Removing expired files associated with ExportedData
>> Skipping ExportedData files
Removing old NodeStore values
Removing UserReport for days=7 project=*
>> Skipping UserReport
Removing GroupEmailThread for days=7 project=*
>> Skipping GroupEmailThread
Removing GroupRuleStatus for days=7 project=*
>> Skipping GroupRuleStatus
Removing RuleFireHistory for days=7 project=*
>> Skipping RuleFireHistory
Removing EventAttachment for days=7 project=*
>> Skipping EventAttachment
>> Skipping Group
Cleaning up unused FileBlob references
>> Skipping FileBlob
/usr/local/lib/python3.8/site-packages/sentry/runner/initializer.py:555: DeprecatedSettingWarning: The SENTRY_URL_PREFIX setting is deprecated. Please use SENTRY_OPTIONS['system.url-prefix'] instead.
  warnings.warn(DeprecatedSettingWarning(old, "SENTRY_OPTIONS['%s']" % new))
/usr/local/lib/python3.8/site-packages/memcache.py:1303: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if key is '':
/usr/local/lib/python3.8/site-packages/memcache.py:1304: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if key_extra_len is 0:

This didn't do anything for space. for pg-repack, I just have 6% space left on disk. repack will increse the disk usage before it will free the space. (assuming it will....else no matter what I ran so far....is not doing anythign for freeing the space). I don't think 6% free space will allow it to complete repack.

Any more ideas are appreciated.

Sep 10 '22 15:09 Cloud-Mak

I'm afraid the cleanup script is all we can really offer as an attempt to clean up the space on your disk 😕. You might be able to clean up some more based on what you deem as data that is not useful as matthewbyrne has done above. As Chad mentioned, we have a paid offering of Sentry which eases this maintenance burden.

Sep 12 '22 16:09 hubertdeng123

@Cloud-Mak I have a solution for this, which I implemented to change the broker of sentry to rabbit MQ , and its working more smoothly, Also you need to add configurations for KAFKA to limit data

Sep 14 '22 06:09 sharjeelz

Sharjeel,

Great to hear from you. We ended up resetting sentry. Uped psgsql docker image to 14 (it was 9.6. which is end of life version). Lost the data & had to redo stuff

BTW - can you share steps for message broker change & Kafka settings?

For my case - psgsql was biggest consumer of space

-- makrand sent from handheld using (over)smart autocorrect.... excuse typos.

On Wed, Sep 14, 2022, 11:35 Sharjeel Zubair @.***> wrote:

@Cloud-Mak https://github.com/Cloud-Mak I have a solution for this, which I implemented to change the broker of sentry to rabbit MQ , and its working more smoothly, Also you need to add configurations for KAFKA to limit data

— Reply to this email directly, view it on GitHub https://github.com/getsentry/self-hosted/issues/1682#issuecomment-1246284351, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6SSWMKWKVV6UQQTJN36SLV6FTKTANCNFSM6AAAAAAQGNOM3Q . You are receiving this because you were mentioned.Message ID: @.***>

Sep 14 '22 08:09 Cloud-Mak

@Cloud-Mak We had a similar problem and our case is better after these changes to docker-compose.yml:

      KAFKA_LOG_RETENTION_HOURS: "1"
      KAFKA_LOG_RETENTION_BYTES: "53687091200"
      KAFKA_LOG_SEGMENT_BYTES: "134217728"
      KAFKA_LOG_RETENTION_CHECK_INTERVAL_MS: "300000"
      KAFKA_LOG_SEGMENT_DELETE_DELAY_MS: "60000"
      KAFKA_LOG_CLEANER_ENABLE: "true"
      KAFKA_LOG_CLEANUP_POLICY: "delete"

More info: https://github.com/getsentry/self-hosted/issues/1147#issuecomment-971373835

Sep 18 '22 10:09 aminvakil

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!

"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

Oct 10 '22 00:10 github-actions[bot]