community-edition icon indicating copy to clipboard operation
community-edition copied to clipboard

ClickHouse store folder grows uncontrollably and is not cleaned up

Open thibautLedz opened this issue 3 months ago • 10 comments

Hello team,

I'm self-hosting Plausible CE using Docker Compose (with the official setup). Plausible 3.0.1. I’ve noticed that the plausible-ce_event-data volume (used by ClickHouse) keeps growing in size, even though:

  • The events_v2 table only holds about 700k rows.
  • system.parts reports only a few dozen active parts, totaling ~10 MiB on disk.
  • Yet, the underlying ClickHouse store folder is currently over 44 GiB:

/var/lib/docker/volumes/plausible-ce_event-data/_data/store/0bb/0bbbeb02-... → 44G causing NO LEFT SPACE ON DEVICE

After investigation:

  • Many large part folders (several hundred MB to >1 GiB each) exist on disk but do not correspond to any active part in system.parts.
  • OPTIMIZE TABLE ... FINAL does not seem to clear them.
  • These parts appear to be dangling / orphaned, possibly due to merges being aborted or unreclaimed data.
  • The files continue being updated (file timestamps) even though they're not referenced by any active table part.
  • This leads to disk exhaustion over time on relatively small VPS setups.

Steps I've taken:

  • Ran ClickHouse queries to list active parts and their sizes
  • Compared actual files in /store with system.parts
  • Verified no active merges in system.merges
  • Tried OPTIMIZE TABLE ... FINAL
  • Ensured no secondary container is writing to the volume

Questions

  • Is there anything in Plausible’s setup that might prevent ClickHouse from garbage collecting those parts?
  • Could you recommend a safe cleanup strategy or confirm if manual deletion of those folders is acceptable?

Thanks a lot for this awesome project. I'd be happy to provide logs or help debug further if needed.

Best,
Thibaut

thibautLedz avatar Sep 21 '25 10:09 thibautLedz

Chatgpt suggests me to add a new config file and play with merge_tree options :

plausible-ce/clickhouse/merge_tree_cleanup.xml

<clickhouse>
    <merge_tree>
        ...
    </merge_tree>
</clickhouse>

plausible-ce/docker-compose.yml

volumes:
  - ./clickhouse/merge_tree_cleanup.xml:/etc/clickhouse-server/config.d/merge_tree_cleanup.xml:ro

thibautLedz avatar Sep 21 '25 11:09 thibautLedz

I had to stop using the selfhosted Plausible CE for that reason.

e11bits avatar Sep 22 '25 20:09 e11bits

Did you find a solution @thibautLedz ? I deleted a big folder and it seems this did not have any side-effects. But. Yeah. Probably not the best approach to delete random filesystem-folders …

spammads avatar Sep 29 '25 08:09 spammads

I used chatGPT to help me compare active parts and dead parts in the store folder and delete all inactives. But it is a temporary solution. Folder is still growing

thibautLedz avatar Sep 29 '25 09:09 thibautLedz

ok. thanks for the update. i now set

<merge_tree>
    <allow_experimental_replacing_merge_with_cleanup>1</allow_experimental_replacing_merge_with_cleanup>
</merge_tree>

and will have a look if this keeps the folder-size a bit under control.

spammads avatar Sep 29 '25 09:09 spammads

Same issue for me. 3 days and clickhouse managed to fill 30gb of disk space.

"Output": "OCI runtime exec failed: write /tmp/runc-process724548340: no space left on device"

xmasterg avatar Dec 13 '25 10:12 xmasterg

I use the defaults described in the repo and clickhouse seems to not be such a hog anymore. Running it for 2 months like this without the need to rm -rf anything.

spammads avatar Dec 13 '25 10:12 spammads

I use the defaults described in the repo and clickhouse seems to not be such a hog anymore. Running it for 2 months like this without the need to rm -rf anything.

Oh, alright, thank you. Will test it.

xmasterg avatar Dec 13 '25 10:12 xmasterg

Im almost a bit embarrassed to say this because its a bit of a rtfm-issue i guess. here is my complete config (with some annotations) if you want to yoink:

<clickhouse>
    <storage_configuration>
        <disks>
            <default>
                <keep_free_space_bytes>33687091200</keep_free_space_bytes>
            </default>
        </disks>
        <data>
            <path>/data/</path>
            <keep_free_space_bytes>33687091200</keep_free_space_bytes>
        </data>
    </storage_configuration>

    <logger>
        <level>warning</level>
        <console>true</console>
    </logger>

    <query_log replace="1">
        <database>system</database>
        <table>query_log</table>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
        <engine>
            ENGINE = MergeTree
            PARTITION BY event_date
            ORDER BY (event_time)
            TTL event_date + interval 30 day
            SETTINGS ttl_only_drop_parts=1
        </engine>
    </query_log>

    <!-- Stops unnecessary logging -->
    <metric_log remove="remove" />
    <asynchronous_metric_log remove="remove" />
    <query_thread_log remove="remove" />
    <text_log remove="remove" />
    <trace_log remove="remove" />
    <session_log remove="remove" />
    <part_log remove="remove" />

    <mark_cache_size>524288000</mark_cache_size>

    <profile>
        <default>
            <!-- https://clickhouse.com/docs/en/operations/settings/settings#max_threads -->
            <max_threads>1</max_threads>
            <!-- https://clickhouse.com/docs/en/operations/settings/settings#max_block_size -->
            <max_block_size>8192</max_block_size>
            <!-- https://clickhouse.com/docs/en/operations/settings/settings#max_download_threads -->
            <max_download_threads>1</max_download_threads>
            <!--
            https://clickhouse.com/docs/en/operations/settings/settings#input_format_parallel_parsing -->
            <input_format_parallel_parsing>0</input_format_parallel_parsing>
            <!--
            https://clickhouse.com/docs/en/operations/settings/settings#output_format_parallel_formatting -->
            <output_format_parallel_formatting>0</output_format_parallel_formatting>
        </default>
    </profile>
    <merge_tree>
        <allow_experimental_replacing_merge_with_cleanup>1</allow_experimental_replacing_merge_with_cleanup>
    </merge_tree>
</clickhouse>

spammads avatar Dec 13 '25 10:12 spammads

I just bumped into this issue, so very timely response, will try your config @spammads thanks for sharing! 🤞🏽


nixos :) SELECT
    database,
    table,
    formatReadableSize(sum(bytes_on_disk)) AS size,
    count() AS parts
FROM system.parts
WHERE active
GROUP BY
    database,
    table
ORDER BY sum(bytes_on_disk) DESC

Query id: 2e9bd929-8e38-4bb8-b9af-fbf0519f781e

    ┌─database─┬─table───────────────────────────────┬─size───────┬─parts─┐
 1. │ system   │ trace_log                           │ 25.81 GiB  │    49 │
 2. │ system   │ text_log                            │ 688.21 MiB │     9 │
 3. │ system   │ metric_log                          │ 213.46 MiB │    45 │
 4. │ system   │ asynchronous_metric_log             │ 99.09 MiB  │    11 │
 5. │ system   │ processors_profile_log              │ 9.51 MiB   │     3 │
 6. │ system   │ query_log                           │ 6.79 MiB   │     3 │
 7. │ default  │ location_data                       │ 2.91 MiB   │     1 │
 8. │ default  │ sessions_v2_tmp_versioned           │ 716.55 KiB │    22 │
 9. │ default  │ sessions_v2                         │ 688.73 KiB │    23 │
10. │ default  │ events_v2                           │ 587.25 KiB │    23 │
11. │ system   │ part_log                            │ 357.56 KiB │     2 │
12. │ system   │ query_metric_log                    │ 288.17 KiB │     2 │
13. │ system   │ error_log                           │ 246.97 KiB │     4 │
14. │ default  │ ingest_counters                     │ 99.28 KiB  │     2 │
15. │ system   │ asynchronous_insert_log             │ 14.14 KiB  │     2 │
16. │ system   │ backup_log                          │ 9.70 KiB   │     1 │
17. │ default  │ acquisition_channel_source_category │ 7.33 KiB   │     1 │
18. │ default  │ acquisition_channel_paid_sources    │ 358.00 B   │     1 │
    └──────────┴─────────────────────────────────────┴────────────┴───────┘

18 rows in set. Elapsed: 0.012 sec. 

EDIT: the trace_log table was the culprit. I dropped the table and deleted the corresponding data (in NixOS under /var/lib/clickhouse/store; the symlink is stored in /var/lib/clickhouse/data/system/trace_log).

Down from 100% disk usage to 26%! 🥳 Let's see how the new configuration fairs... :)

gvolpe avatar Dec 13 '25 12:12 gvolpe