ClickHouse store folder grows uncontrollably and is not cleaned up
Hello team,
I'm self-hosting Plausible CE using Docker Compose (with the official setup). Plausible 3.0.1. I’ve noticed that the plausible-ce_event-data volume (used by ClickHouse) keeps growing in size, even though:
- The
events_v2table only holds about 700k rows. system.partsreports only a few dozen active parts, totaling ~10 MiB on disk.- Yet, the underlying ClickHouse store folder is currently over 44 GiB:
/var/lib/docker/volumes/plausible-ce_event-data/_data/store/0bb/0bbbeb02-... → 44G causing NO LEFT SPACE ON DEVICE
After investigation:
- Many large part folders (several hundred MB to >1 GiB each) exist on disk but do not correspond to any active part in
system.parts. OPTIMIZE TABLE ... FINALdoes not seem to clear them.- These parts appear to be dangling / orphaned, possibly due to merges being aborted or unreclaimed data.
- The files continue being updated (file timestamps) even though they're not referenced by any active table part.
- This leads to disk exhaustion over time on relatively small VPS setups.
Steps I've taken:
- Ran ClickHouse queries to list active parts and their sizes
- Compared actual files in
/storewithsystem.parts - Verified no active merges in
system.merges - Tried
OPTIMIZE TABLE ... FINAL - Ensured no secondary container is writing to the volume
Questions
- Is there anything in Plausible’s setup that might prevent ClickHouse from garbage collecting those parts?
- Could you recommend a safe cleanup strategy or confirm if manual deletion of those folders is acceptable?
Thanks a lot for this awesome project. I'd be happy to provide logs or help debug further if needed.
Best,
Thibaut
Chatgpt suggests me to add a new config file and play with merge_tree options :
plausible-ce/clickhouse/merge_tree_cleanup.xml
<clickhouse>
<merge_tree>
...
</merge_tree>
</clickhouse>
plausible-ce/docker-compose.yml
volumes:
- ./clickhouse/merge_tree_cleanup.xml:/etc/clickhouse-server/config.d/merge_tree_cleanup.xml:ro
I had to stop using the selfhosted Plausible CE for that reason.
Did you find a solution @thibautLedz ? I deleted a big folder and it seems this did not have any side-effects. But. Yeah. Probably not the best approach to delete random filesystem-folders …
I used chatGPT to help me compare active parts and dead parts in the store folder and delete all inactives. But it is a temporary solution. Folder is still growing
ok. thanks for the update. i now set
<merge_tree>
<allow_experimental_replacing_merge_with_cleanup>1</allow_experimental_replacing_merge_with_cleanup>
</merge_tree>
and will have a look if this keeps the folder-size a bit under control.
Same issue for me. 3 days and clickhouse managed to fill 30gb of disk space.
"Output": "OCI runtime exec failed: write /tmp/runc-process724548340: no space left on device"
I use the defaults described in the repo and clickhouse seems to not be such a hog anymore. Running it for 2 months like this without the need to rm -rf anything.
I use the defaults described in the repo and clickhouse seems to not be such a hog anymore. Running it for 2 months like this without the need to
rm -rfanything.
Oh, alright, thank you. Will test it.
Im almost a bit embarrassed to say this because its a bit of a rtfm-issue i guess. here is my complete config (with some annotations) if you want to yoink:
<clickhouse>
<storage_configuration>
<disks>
<default>
<keep_free_space_bytes>33687091200</keep_free_space_bytes>
</default>
</disks>
<data>
<path>/data/</path>
<keep_free_space_bytes>33687091200</keep_free_space_bytes>
</data>
</storage_configuration>
<logger>
<level>warning</level>
<console>true</console>
</logger>
<query_log replace="1">
<database>system</database>
<table>query_log</table>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
<engine>
ENGINE = MergeTree
PARTITION BY event_date
ORDER BY (event_time)
TTL event_date + interval 30 day
SETTINGS ttl_only_drop_parts=1
</engine>
</query_log>
<!-- Stops unnecessary logging -->
<metric_log remove="remove" />
<asynchronous_metric_log remove="remove" />
<query_thread_log remove="remove" />
<text_log remove="remove" />
<trace_log remove="remove" />
<session_log remove="remove" />
<part_log remove="remove" />
<mark_cache_size>524288000</mark_cache_size>
<profile>
<default>
<!-- https://clickhouse.com/docs/en/operations/settings/settings#max_threads -->
<max_threads>1</max_threads>
<!-- https://clickhouse.com/docs/en/operations/settings/settings#max_block_size -->
<max_block_size>8192</max_block_size>
<!-- https://clickhouse.com/docs/en/operations/settings/settings#max_download_threads -->
<max_download_threads>1</max_download_threads>
<!--
https://clickhouse.com/docs/en/operations/settings/settings#input_format_parallel_parsing -->
<input_format_parallel_parsing>0</input_format_parallel_parsing>
<!--
https://clickhouse.com/docs/en/operations/settings/settings#output_format_parallel_formatting -->
<output_format_parallel_formatting>0</output_format_parallel_formatting>
</default>
</profile>
<merge_tree>
<allow_experimental_replacing_merge_with_cleanup>1</allow_experimental_replacing_merge_with_cleanup>
</merge_tree>
</clickhouse>
I just bumped into this issue, so very timely response, will try your config @spammads thanks for sharing! 🤞🏽
nixos :) SELECT
database,
table,
formatReadableSize(sum(bytes_on_disk)) AS size,
count() AS parts
FROM system.parts
WHERE active
GROUP BY
database,
table
ORDER BY sum(bytes_on_disk) DESC
Query id: 2e9bd929-8e38-4bb8-b9af-fbf0519f781e
┌─database─┬─table───────────────────────────────┬─size───────┬─parts─┐
1. │ system │ trace_log │ 25.81 GiB │ 49 │
2. │ system │ text_log │ 688.21 MiB │ 9 │
3. │ system │ metric_log │ 213.46 MiB │ 45 │
4. │ system │ asynchronous_metric_log │ 99.09 MiB │ 11 │
5. │ system │ processors_profile_log │ 9.51 MiB │ 3 │
6. │ system │ query_log │ 6.79 MiB │ 3 │
7. │ default │ location_data │ 2.91 MiB │ 1 │
8. │ default │ sessions_v2_tmp_versioned │ 716.55 KiB │ 22 │
9. │ default │ sessions_v2 │ 688.73 KiB │ 23 │
10. │ default │ events_v2 │ 587.25 KiB │ 23 │
11. │ system │ part_log │ 357.56 KiB │ 2 │
12. │ system │ query_metric_log │ 288.17 KiB │ 2 │
13. │ system │ error_log │ 246.97 KiB │ 4 │
14. │ default │ ingest_counters │ 99.28 KiB │ 2 │
15. │ system │ asynchronous_insert_log │ 14.14 KiB │ 2 │
16. │ system │ backup_log │ 9.70 KiB │ 1 │
17. │ default │ acquisition_channel_source_category │ 7.33 KiB │ 1 │
18. │ default │ acquisition_channel_paid_sources │ 358.00 B │ 1 │
└──────────┴─────────────────────────────────────┴────────────┴───────┘
18 rows in set. Elapsed: 0.012 sec.
EDIT: the trace_log table was the culprit. I dropped the table and deleted the corresponding data (in NixOS under /var/lib/clickhouse/store; the symlink is stored in /var/lib/clickhouse/data/system/trace_log).
Down from 100% disk usage to 26%! 🥳 Let's see how the new configuration fairs... :)