self-hosted
self-hosted copied to clipboard
Cleaning nodestore_node table
Problem Statement
Over time nodestore_node table gets bigger and bigger and currently there is no procedure to clean it up.
A forum comment explaining what nodestore_node is by @untitaker :
To answer the question about purpose of nodestore:
nodestore stores the raw, unindexed parts of the data of an event. For example all event breadcrumbs and contexts are stored there. Clickhouse only stores the data you can search and aggregate by. It only has indices, so to speak.
nodestore has multiple backends. On sentry.io it is not postgres, but google bigtable. We use the postgres backend for local development and for onpremise, but for large-scale installations it is suboptimal.
We don’t have a solution right now for the storage problem in any case, sorry. One thing you should ensure actually runs is
sentry cleanup, but it’s definitely possible that nodestore takes this amount of disk space.
https://forum.sentry.io/t/postgres-nodestore-node-table-124gb/12753/3
Solution Brainstorm
There was an idea suggested on forum which worked for me, but I lost all event details. Something like this would work:
DELETE FROM public.nodestore_node WHERE “timestamp” < NOW() - INTERVAL '1 day';
VACUUM FULL public.nodestore_node;
Change 1 day according to your needs.
Maybe put this in a cron container which gets run every night, we should think about its performance issues though, this took a long time to get executed on our instance, maybe because it wasn't run before, but I'm not sure.
cc @BYK @chadwhitacre @hubertdeng123
There are a few interesting and important things:
- We can never add the
VACUUM FULL public.nodestore_node;part in any kind of production setup as it will lock the whole table for an indefinite time. It also requires additional space on disk so it has a high potential to put the system in an inoperable and hard-to-recover state. - The
sentry cleanupcommand that we trigger via the cron does runnodestore.cleanup(): https://github.com/getsentry/sentry/blob/1d24b160a74e10759d20fa323f5251bffee3f7b5/src/sentry/runner/commands/cleanup.py#L260-L268 which is defined at https://github.com/getsentry/sentry/blob/1d24b160a74e10759d20fa323f5251bffee3f7b5/src/sentry/nodestore/django/backend.py#L54-L62
So technically, this should work, if you set SENTRY_RETENTION_DAYS to 90 or less which is the default. The only missing thing I see here is an explicit VACUUM call but autovacuum should handle that already: https://www.postgresql.org/docs/12/routine-vacuuming.html#AUTOVACUUM
Now we need to figure out which part of the puzzle is broken:
- Do we have the
nodestore.cleanup()call successfully? - If so, does it actually run the
DELETEcommands? - If so, does the autovacuum thingy work in the Postgres container or do we need to enable that
- Even then, should we issue a manual
VACUUMcommand after a cleanup operation?
I have nothing to add to what BYK said. This should work out of the box assuming you run sentry cleanup
I lost all event details.
Did you lose all event details because you ran the command without substituting a real value for SENTRY_RETENTION_DAYS, is that what you mean? Does Postgres silently accept INTERVAL ‘SENTRY_RETENTION_DAYS’ as equivalent to zero, something like that?
This issue has gone three weeks without activity. In another week, I will close it.
But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!
"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀
I lost all event details.
Did you lose all event details because you ran the command without substituting a real value for
SENTRY_RETENTION_DAYS, is that what you mean? Does Postgres silently acceptINTERVAL ‘SENTRY_RETENTION_DAYS’as equivalent to zero, something like that?
Sorry for late response, No, I ran this exact same command: DELETE FROM public.nodestore_node WHERE "timestamp" < NOW() - INTERVAL '1 day';
This issue has gone three weeks without activity. In another week, I will close it.
But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!
"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀
We hit this problem as well. nodestore_node was 42GB and the table had never been vacuumed.
We ran the sentry-cleanup service which did it's thing, but did not reduce disk space. We ran vacuum analyze which did not reduce disk space. We ran vacuum full nodestore_node and that did reduce disk space to 3GB. Postgres also then reported that vacuum had been executed one time.
postgres=# select relname, vacuum_count, n_tup_del, n_live_tup, n_dead_tup
from pg_stat_sys_tables
where relname='pg_toast_20247';
relname | vacuum_count | n_tup_del | n_live_tup | n_dead_tup
----------------+--------------+-----------+------------+------------
pg_toast_20247 | 1 | 62889277 | 1840323 | 0
(1 row)
So it seems that (3) is true. The autovacuum thingy inside the postgres container is not working.
However, this seems to indicate that it is in fact enabled:
postgres=# SHOW autovacuum;
autovacuum
------------
on
(1 row)
Someone hinted on a forum that if Postgres is under load it may not get a chance to autovacuum. I don't think our installation is especially high load, but if it is the case that Postgres never runs autovacuum under nominal load then maybe sentry self-hosted does need a way to explicitly trigger vacuum on some kind of schedule.
While it is true that vacuum locks the table and uses some disk space, and temporarily doubling our 42GB table during a vacuum might have causes some issues, it only took a couple of minutes to complete the vacuum and it significantly dropped the disk usage.
If vacuum was run regularly, the size should also hover at a much lower level (e.g. 3GB instead of 42GB) and should not get large enough to cause significant issues with disk usage during the vacuum.
We would be happy to accept a periodic 2 minute complete sentry outage to avoid unbounded disk usage, but it seems that sentry did not suffer a complete outage during the vacuum anyway. We're also willing to accept a partial outage or some amount of events that are missing or missing data.
We could also schedule vacuum to occur during a specified maintenance window to further reduce the impact to production systems.
I also tried the pg_repack option mentioned in the docks, but the current command did not run at all (failed to install pg_repack inside the container) and an older version of the command I found in a GitHub issue that matched our version of postgres also failed to install inside the container.
So I think a setting to schedule a vacuum full nodestore_node in postgres is the best way forward. It could be disabled by default, but should be easy to enable via setting that allows a specific schedule to be set.
Reading more about vacuum and vacuum full. The latter rewrites the whole table and returns disk space back to the operating system. The former frees up deleted rows for re-use but does not return disk space back to the operating system. So if the sentry cleanup could run vacuum immediately after deleting rows from nodestore_node, and the cleanup frequency was high enough (or configurable) for the volume of new writes vs deletions of old data, then there should be no more unbounded growth without requiring a full table lock or any downtime. This should be easier to implement and more reliable than trying to configure the autovacuum daemon to do it for us?
https://www.postgresql.org/docs/12/routine-vacuuming.html#VACUUM-FOR-SPACE-RECOVERY
Thanks for the deep dive, @mrmachine.
In our environment, we have not yet upgraded to 23.4.0 as disk space is almost 80% full and cannot upgrade our postgres.
Therefore we have not yet tested if upgrading to 14.5 would have an effect on this matter.
It's possible to free up space without downtime using pg_repack
-
We also encountered this issue, but it was more severe for us. Our nodestore_node table expanded to 3.6TB, so we had to focus on and resolve this problem. We tried to pull a new instance from backup data to verify how long a VACUUM FULL nodestore_node would take and found that it would take at least 8 hours. This means that if we perform this operation in production, Sentry would be down for 8 hours, which is unacceptable.
-
However, we clearly understand that nodestore_node is just a KV (key-value) abstract storage, using id as the key, to store or query event JSONs, each JSON being approximately 6~70 KB. Of course, the data stored in nodestore_node is already compressed. So I naturally thought of using an object storage product like S3, which can very well solve the storage problem of nodestore_node, using id as the file path, with each event JSON being a file.
-
Now, I have managed to use S3 to store the messages in nodestore_node and have submitted a PR :https://github.com/getsentry/sentry/pull/76250. I hope others facing similar issues can provide some suggestions.
postgresql size
oss files
UPDATE (good news, this time):
We are planning to move away from PostgreSQL as the nodestore backend to something else, probably between S3-compatible (like Garage, on this PR: https://github.com/getsentry/self-hosted/pull/3498), and filesystem based. An internal discussion will be held sometime next week.
The consensus is towards an filesystem-based backend right now. Since the current FS-backed node store implementation is a bit naive and geared towards debuggability, it will take some more time once we build a more prod-ready node store on top of that.
But fret not as it shouldn't take too long. Also that S3/Garage patch can serve as a blueprint for people willing to experiment.
@BYK I remember that there were concerns around node store leakage. They were solved by deleting based on postgres timestamp. If there any leakage, it will become a major concern with filesystem based nodetore
@manishrawat1992 can you explain what you mean by "leakage"?
Sorry @BYK, I remembered that there were some concerns with high nodestore disk usage, but forgot that it was due to fragmentation.
My concerns were
- Fragmentation still remains an issue for us in filesystem.
- I rechecked the code, we delete on basis of creation time, that in our case would mean traversing entire filesystem metadata.
@manishrawat1992 the main issue was not fragmentation as far as I'm aware. The main issue with Postgres is even if you delete the rows, the disk space is not given back until a FULL VACUUM which locks the entire database so is not feasible. A simple fs-backed node store would not have this problem.
I rechecked the code, we delete on basis of creation time, that in our case would mean traversing entire filesystem metadata.
This is the main challenge that needs to be solved with a proper fs-backed node store and that's why we don't have one yet. It typically requires some sort of a mini-db (like SQLite) to keep track of the files and quickly get a list. Simply, we need to do some more research on this before we can proceed.
@BYK what about the same Postgres structure but data in disk, will try to raise a PR by Monday
I'd still be wary of using a Postgres table for this as it will still be a very volatile table needing a vacuum eventually.
I think sqlite would be a better candidate but I need to do some more research on this.
I tried sentry-cassandra-nodestorage plugin on scylla with native ttl and it works like a charm.
Okay so... rather than moving forward with #3498, why don't we just provide information on the self-hosted docs that users can choose their nodestore backend and install external plugins (S3, cassandra, etc) so that it won't trouble the main sentry repository? What do you think? @BYK @hubertdeng123
So from what I can read here, even setting a SENTRY_EVENT_RETENTION_DAYS=30 inside the .env (and doing a ./install.sh) does NOT clean up old data.
As far as I can tell, the value is being passed down correctly to the sentry-cleanup service since I can see it in my logs.
root@scarecrow:~/sentry# docker compose logs sentry-cleanup
sentry-cleanup-1 | 0 0 * * * gosu sentry sentry cleanup --days 30 > /proc/1/fd/1 2>/proc/1/fd/2
But no change has been applied to the size of nodestore_node even after a few days
table_name | pg_size_pretty | pg_total_relation_size
------------------------------------------------+----------------+------------------------
nodestore_node | 160 GB | 171767357440
therefore - as previously said - the auto vacuum feature of postgres does not seem to work in self hosted instances.
I'd also appreciate either a flag to manually enable the vacuum call on cleanup or some documentation, how to - more easily - trigger a vacuum of the database yourself.
UPDATE: Actually I just tried to run a VACUUM FULL public.nodestore_node; but I had not enough disk space to do that.
UPDATE 2: ACTUALLY the data from 3 months ago is still present 🙈 Why? I don't know. I expected it to delete old data the next time sentry-cleanup runs but it clearly doesnt. So now I am doing what was initially describe and delete old entries via SQL.
therefore - as previously said - the auto vacuum feature of postgres does not seem to work in self hosted instances.
Well, from what I understand of autovacuum, and from experience with our self hosted, autovacuum works during downtimes, like, when there are no changes being made to the table. On our instance, there are always inserts being done, about 1.6 inserts/seconds :
postgres=# select count(*),min(timestamp),max(timestamp),count(*)/extract(EPOCH FROM max(timestamp)-min(timestamp)) from nodestore_node;
count | min | max | ?column?
----------+------------------------------+-------------------------------+--------------------
12894470 | 2025-03-04 00:00:14.21486+00 | 2025-06-02 07:10:19.143762+00 | 1.6527547301563620
(1 row)
So there's no real downtime for autovacuum to work.
would it be bad if instead of storing the raw data on nodestore, we store it into a file system and only store the file path on nodestore?
would it be bad if instead of storing the raw data on nodestore, we store it into a file system and only store the file path on nodestore?
@lucas-zimerman I believe it would be inefficient since we'll read on both postgres and filesystem. On SaaS, they're using Google BigTable, which is a pretty simple KV store. For me personally, these are the only 'makes sense' implementations:
- https://github.com/stayallive/sentry-nodestore-s3
- https://github.com/andrsp/sentry-nodestore-elastic
- https://github.com/doc-sheet/sentry-cassandra-nodestorage
I am still 95% sure something is wrong with the cleanup procedure sentry should be doing...
Postgres is the most contributing factor on taking up system space, but also clickhouse has quite the storage capacity taken up.
I do have decreased my retention days variable in my .env Variable to 30 days instead of 90 and re-build all the contains via ./install.sh but still it didn't change the used up space.
Then I just try to manually delete old nodestore data via the SQL query from above and was blown away, that over 10 Million entries have been deleted.
root@scarecrow:~/sentry# docker compose exec postgres bash
root@87e512fe3c82:/# su postgres
postgres@87e512fe3c82:/$ psql
postgres=# DELETE FROM public.nodestore_node WHERE "timestamp" < NOW() - INTERVAL '30 day';
DELETE 10260723
This took like 4 hours and prevented any ongoing sentry events to be processed since the table got locked (I guess). But still PostgreSQL doesn't free up that space since it needs to vacuum. But if I try to do that, I get the following error:
postgres=# VACUUM public.nodestore_node;
ERROR: could not resize shared memory segment "/PostgreSQL.3494490198" to 67145056 bytes: No space left on device
cool.... so I guess there is no way out of this without actually buying bigger storage and moving it over?
Or is there at least something that can be done on the clickhouse side to clean up space?
@LordSimal there's nothing wrong with the clean up. It fully aligns what is documented over at https://develop.sentry.dev/self-hosted/troubleshooting/postgres/ -- Note the bold text higlighting that the disk usage spiking before going down. This is not something we can help with.
We are actively looking into other storage options for nodestore though /cc @aldy505
@aldy505 may shine some light on whether it is possible to reduce Clickhouse usage.
I see that your kafka usage is not that high but you may still benefit from doing this: https://develop.sentry.dev/self-hosted/troubleshooting/kafka/#reducing-disk-usage
docker compose run --rm -T web cleanup --days 30 -m nodestore -l debug
cleared up around 15GB, so that's nice. But
docker compose run --rm -T postgres bash -c "apt update && apt install -y --no-install-recommends postgresql-14-repack && su postgres -c 'pg_repack -E info -t nodestore_node'"
doesn't work for me because it seems, that pg_repack can't connect to the postgres server.
ERROR: could not connect to database: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
Is the server running locally and accepting connections on that socket?
so i tried to do it manually, but after installing repack, changing user and executing the command I get
postgres@aa430c3dfefd:/$ pg_repack -E info -t nodestore_node
ERROR: pg_repack failed with error: pg_repack 1.5.2 is not installed in the database
so to get repack working you have to follow these steps (while sentry is running)
docker compose exec postgres bash
apt update && apt install -y --no-install-recommends postgresql-14-repack
su postgres
psql
CREATE EXTENSION pg_repack;
\q
pg_repack -E info -t nodestore_node
and then the repacking seems to work.
No idea why the documented command with docker compose run doesn't work, but at least this should help other users struggling with this.
~Now I just have to wait and see how much the repacking helps.~ had to cancel it before I got not disk space left
Another question: Is it possible to just truncate the nodestorage table? I don't care about all my performance monitoring data (since that seems to be the primary use for that table)
Maybe the repacking/vacuuming works better then.