posthog
posthog copied to clipboard
Overlay2 Diskspace Buildup in PostHog Hobby Self-Hosted Installation
The storage directory /var/lib/docker/overlay2 on my system is experiencing rapid growth, resulting in a substantial increase in disk usage. Despite employing the docker system prune -a -f command, which is intended to remove all unused resources, the storage reclamation process seems to be unsuccessful, and the disk space continues to be occupied.
root@ip-10-88-81-122:/home/ubuntu# df -h Filesystem Size Used Avail Use% Mounted on /dev/root 20G 11G 8.8G 55% / tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 1.6G 2.1M 1.6G 1% /run tmpfs 5.0M 0 5.0M 0% /run/lock /dev/nvme0n1p15 105M 6.1M 99M 6% /boot/efi tmpfs 784M 4.0K 784M 1% /run/user/1000 overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/483f16e624886b92706cb359a43b49382da8bb0e941b92300eaa734021b375a5/merged overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/61c74810d314349e1501316eb0691cba8a2cc492a43c192be20139921e755375/merged overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/f3096acb8c5abe39f20c8d17596fcd27ebee2da268d8f34fb5a5675408930cb0/merged overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/e4ad001cd84511c3300ba117ed1875de451685e06ec17abbbb1a310c3829247c/merged overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/dc4165c6c6fb31315b14f7b12bc25c9751dee19d01b59835a11612cde3fdb2b1/merged overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/3c62cc1fa3b29fb77ebf2e687d016885d1fbf75e6a7ed61244d7208987761604/merged overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/faaf69098c28174384aa9dd3aca9a749098df7c6e2d7f0838f9dcb2f7ab366cb/merged overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/58d4bfb9cc2aa2887ea5932488163e0600bd7d94a4d117921f0ea10433d02a49/merged overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/8a2fc203a1a4b5df82cb9baddc065e64a50b78deb316f26a0605961279ad9e56/merged overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/a29e7064412daa4b4293d910c82220a4ce236f74fa330993cd81a4372c23cf04/merged overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/91530379a8710aa7f2a1b65b7c0e38f8e33ef255c26aece4fd06e9d23f7ef0b3/merged overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/3af1b6f804b3ca7b8956f9ea510febdb44207f29ba16f1150ed2b032d66d6ba2/merged overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/3e82e3ebf82ed31ddf155013793194fb4fc80172dc42925ff3466ae42a87ae09/merged overlay 20G 11G 8.8G 55% /var/lib/docker/overlay2/67f295aa42422d4d7a271bb252ab59bd8fffad0d877794a56fdede05af250c8e/merged
Any updates on this issue
Same issue here. /var/lib/docker/overlay2/ is usually multiple hundreds of GB in size. We're just evaluating PostHog, so there are just a few couple hundred events collected.
we are seeing it too, around 4GB a day, what is it?
Is it safe to cleanup this container?
How are you guys handling this clogging of disk space?
Within 3-4 days ~~SpaceHog~~ PostHog ate ALL available server space (0.5Tb). Luckily it's a fixed-cost bare-metal server, not some auto-expanding and auto-billed cloud instance. Moving it to 2tb server now in attempt to figure out what's happening.
I believe session recordings blobs also can be a problem: https://posthog.com/docs/session-replay/data-retention#:~:text=Recordings%20are%20automatically%20deleted%20after,manually%20deleted%20via%20the%20UI.
/var/lib/docker/overlay2/
grows several hunders GIGABYTES per day. I believe it's session recordings blobs, I see they're in jsonl format + gzipped, but original jsonl files are not deleted (otherwise why compress them to begin with?)
On a new instance started 4 hours ago:
➜ session-buffer-files du -h --max-depth=1 /var/lib/docker/overlay2 | sort -rh | head -25
64G /var/lib/docker/overlay2
20G /var/lib/docker/overlay2/22ca12243f2f9f8f5cdef850c84dacdc0fb898b44424acefdd8d21d3ee83ee57
19G /var/lib/docker/overlay2/37a2890cf25afbb5531accea5a40c32ff9abde25139d6f4dc76b4fccbb24b98e
4.6G /var/lib/docker/overlay2/bd73c58dbd7e91576cc2e9a3a1c21f120c3f766b0704b130a08b863cb3de4cfa
4.6G /var/lib/docker/overlay2/a5e928b2b7731333f60f257fd10d78020d66081879c9241d7b8b8d77c5c83f99
4.6G /var/lib/docker/overlay2/644c452dc2651aa3a5070ce2a1c37d01d53c3f068c4607661f6848f82fd9c683
953M /var/lib/docker/overlay2/40be981a0c4b1e16aaba9c4b14461ef56ac11ccd532e14b970200c5917871cfa
880M /var/lib/docker/overlay2/5a93d1cd997d1dfad3b2cfa45c08c5d9ebfb2c3b8d83ccbc0c13038d1ddca142
868M /var/lib/docker/overlay2/58ccd7b0eea3761c78de44a3f2651fff900f69a7dc46123277a5b4bb142b5843
857M /var/lib/docker/overlay2/581a808e76961ce7fe967041d5c5e51e6e9a6c44b7bab44cce3676b23969e27b
818M /var/lib/docker/overlay2/92008300249f4c515156b79880df467b59671ce6f8054c106e3222b600f1e79b
728M /var/lib/docker/overlay2/8b2f680e6edad6f4ff2b7b780c902a169ab60eb5b43bcbc649eb9dfaf1343606
575M /var/lib/docker/overlay2/c20d7668291ec7c0d833b78fc2f358fb4f44cc6d077a17a57a7830654b3ae70b
533M /var/lib/docker/overlay2/e6cb565e6ed7d2617854c6676f102e92652f52aa88c4e66cde2c05514bea18ec
530M /var/lib/docker/overlay2/57fdac189c5b45a9b86ab033ac50254a25aef1836fe4a79f9ea6e35c32f49c3b
427M /var/lib/docker/overlay2/8a9f06aec0cde099006d741c0164b95d761cca2a56826f3ac47a06df7786a8cf
370M /var/lib/docker/overlay2/4af2c6e6afb50f94e617ddd1b9f5b812b16a53ff146de8d787afff6ebace6c3d
357M /var/lib/docker/overlay2/02a3d71c05d64f34645c6d8474ed9817bd249d95e8e3cff7330fae2c65f2aefa
303M /var/lib/docker/overlay2/bcd9fd02aa542bc1da07822b41db4ac22c80d9dc42aa8678a8a746e66f404506
278M /var/lib/docker/overlay2/de67a5fc04b3a1c4da43e90e72eaf2fc0d75dbe7d6a84f9fd52ad392b0768894
232M /var/lib/docker/overlay2/a633121f4492ce8c8e29ad148d346d1b136e8405b3d3a06541beac0d912fd124
229M /var/lib/docker/overlay2/010d8bcf19400830774c695e49a6dc856bbd90b81aa8b6222526d7ddb2383569
222M /var/lib/docker/overlay2/1b3471d305ef0844347bafea4aed5db584c80bf1690aaa4a382570fbbdcdece3
192M /var/lib/docker/overlay2/9a1200ae9f6d5ae9f79c065d080c578fe5e4501dea2a0706f888e144c4d500d6
160M /var/lib/docker/overlay2/04cae4d2354dfd874fd4d407534318643c257079630c6fc21d9b5e57895b464a
ll /var/lib/docker/overlay2/22ca12243f2f9f8f5cdef850c84dacdc0fb898b44424acefdd8d21d3ee83ee57/diff/code/plugin-server/.tmp/sessions/session-buffer-files
[...]
-rw-r--r-- 1 root root 1.4M May 10 13:30 1.018f62ad-d0e9-7b3d-988e-6a94a2779eb9.ee243b2d-b802-4fbe-9e04-aa49707e0f6f.jsonl
-rw-r--r-- 1 root root 76K May 10 13:25 1.018f62ad-d2ef-750a-afe2-6e70ee99941d.0007eab6-c6d8-4131-be95-a990365b6e7b.gz
-rw-r--r-- 1 root root 796K May 10 13:27 1.018f62ad-d2ef-750a-afe2-6e70ee99941d.0007eab6-c6d8-4131-be95-a990365b6e7b.jsonl
-rw-r--r-- 1 root root 124K May 10 13:26 1.018f62ad-d437-7eba-bcc9-896b9b51231c.c922d076-af96-439b-92cb-8199c2f2a6a4.gz
-rw-r--r-- 1 root root 1.4M May 10 13:26 1.018f62ad-d437-7eba-bcc9-896b9b51231c.c922d076-af96-439b-92cb-8199c2f2a6a4.jsonl
-rw-r--r-- 1 root root 76K May 10 13:25 1.018f62ad-d62b-79aa-90b6-9d1691069c81.d2de25b1-d278-4456-9167-ed11912416a4.gz
-rw-r--r-- 1 root root 774K May 10 13:27 1.018f62ad-d62b-79aa-90b6-9d1691069c81.d2de25b1-d278-4456-9167-ed11912416a4.jsonl
-rw-r--r-- 1 root root 76K May 10 13:26 1.018f62ad-d727-7704-89de-6e33722904cd.5a8315d2-6c19-4155-9d47-8c8dcfef8053.gz
-rw-r--r-- 1 root root 736K May 10 13:26 1.018f62ad-d727-7704-89de-6e33722904cd.5a8315d2-6c19-4155-9d47-8c8dcfef8053.jsonl
-rw-r--r-- 1 root root 91K May 10 13:25 1.018f62ad-e22e-70ce-b1d2-ba76f7fdef10.b8433a31-d5e8-49c8-9dae-005f00f7a6f7.gz
-rw-r--r-- 1 root root 793K May 10 13:26 1.018f62ad-e22e-70ce-b1d2-ba76f7fdef10.b8433a31-d5e8-49c8-9dae-005f00f7a6f7.jsonl
-rw-r--r-- 1 root root 25K May 10 13:25 1.018f62ad-ea44-7736-af73-2536db3d3627.9c488d53-7a3b-4eef-b457-d0ad72acf3c3.gz
-rw-r--r-- 1 root root 201K May 10 13:25 1.018f62ad-ea44-7736-af73-2536db3d3627.9c488d53-7a3b-4eef-b457-d0ad72acf3c3.jsonl
-rw-r--r-- 1 root root 51K May 10 13:25 1.018f62ad-f2ad-7a16-a531-6587a99c000e.8613b939-0930-4ea9-8448-5d48d68c0d2a.gz
-rw-r--r-- 1 root root 681K May 10 13:30 1.018f62ad-f2ad-7a16-a531-6587a99c000e.8613b939-0930-4ea9-8448-5d48d68c0d2a.jsonl
-rw-r--r-- 1 root root 73K May 10 13:27 1.018f62ad-f432-7c2d-9525-4687dda261f2.6ca1db85-002b-4f5b-b6c6-2d8eb1ae579f.gz
-rw-r--r-- 1 root root 712K May 10 13:28 1.018f62ad-f432-7c2d-9525-4687dda261f2.6ca1db85-002b-4f5b-b6c6-2d8eb1ae579f.jsonl
-rw-r--r-- 1 root root 25K May 10 13:25 1.018f62ad-fa35-7cc4-a8f6-cb4645073424.1d9d68aa-3748-4833-9f94-54e8ec7d3502.gz
[...]
If only compressed files are left, it'd reduce disk usage by factor of 10x, which is manageable.
Found another culprit:
➜ overlay2 find /var/lib/docker/containers -type f -name "*.log" -print0 | du -shc --files0-from -
3.2M /var/lib/docker/containers/8bf955312b28af19b1a2682d87ada355e8840a9001c6f099723568462722d0cf/8bf955312b28af19b1a2682d87ada355e8840a9001c6f099723568462722d0cf-json.log
89G /var/lib/docker/containers/c1db5eeac69cfa4069ba1269f485668a1b7fd2ff0219cf4db78056868bc369d0/c1db5eeac69cfa4069ba1269f485668a1b7fd2ff0219cf4db78056868bc369d0-json.log
0 /var/lib/docker/containers/1666de8530999e78ae73822eb69f1e75807ea496fc6720c36a054b9ed2c59705/1666de8530999e78ae73822eb69f1e75807ea496fc6720c36a054b9ed2c59705-json.log
44K /var/lib/docker/containers/505cf7699dc0ce332515f383d5e633db9bb03bb835c6cca0349de0e7b44370ef/505cf7699dc0ce332515f383d5e633db9bb03bb835c6cca0349de0e7b44370ef-json.log
4.0K /var/lib/docker/containers/adfd31bd32af160c405404a6f9a72c5af6e8c17ed76eb1575d52f3e8927a1e5a/adfd31bd32af160c405404a6f9a72c5af6e8c17ed76eb1575d52f3e8927a1e5a-json.log
44K /var/lib/docker/containers/6858624ea5d4c1fffdaf2b735ce19c43c868c2fb86247b96fe1f34cddf07d556/6858624ea5d4c1fffdaf2b735ce19c43c868c2fb86247b96fe1f34cddf07d556-json.log
12K /var/lib/docker/containers/daa981bcd2a258ef42846596b7d5358682cd4fd411cba746fd4149cbe24e9e8d/daa981bcd2a258ef42846596b7d5358682cd4fd411cba746fd4149cbe24e9e8d-json.log
200K /var/lib/docker/containers/10c08786257e53d47164d8ac1452b0028cec62ce684f83f213e208af9e7b67dd/10c08786257e53d47164d8ac1452b0028cec62ce684f83f213e208af9e7b67dd-json.log
56K /var/lib/docker/containers/4818f2742c121a147b316f6b0dd49776ad932ffd84e67433967007e0e0b5315e/4818f2742c121a147b316f6b0dd49776ad932ffd84e67433967007e0e0b5315e-json.log
4.0K /var/lib/docker/containers/83ce32cd6082271ac3f8a626d4f1b853a1c5873ab1d6c102f9112a878bd725e4/83ce32cd6082271ac3f8a626d4f1b853a1c5873ab1d6c102f9112a878bd725e4-json.log
4.0K /var/lib/docker/containers/12e8ff5a64161e9ef1c3c1b9daa74473148e53dba06480fb06c2679ec736eadd/12e8ff5a64161e9ef1c3c1b9daa74473148e53dba06480fb06c2679ec736eadd-json.log
6.8M /var/lib/docker/containers/556c05bc2702ef63df881232bf9fc215ec31b7c94f69e8e3a3ed2423bd5ffd87/556c05bc2702ef63df881232bf9fc215ec31b7c94f69e8e3a3ed2423bd5ffd87-json.log
4.0K /var/lib/docker/containers/b27ed0d71ab4f68ba8e439c024c6265e65f6043b9872d9f78a4cd68f8b472eca/b27ed0d71ab4f68ba8e439c024c6265e65f6043b9872d9f78a4cd68f8b472eca-json.log
40K /var/lib/docker/containers/2edf10fa9347c586b7f32e594bb207852db074371aaed3f26298d8f1a55d0e67/2edf10fa9347c586b7f32e594bb207852db074371aaed3f26298d8f1a55d0e67-json.log
644K /var/lib/docker/containers/1bc6ad427a6b61940a68c65b15da96e016b6ee733edce0fa6436537b4a9afe88/1bc6ad427a6b61940a68c65b15da96e016b6ee733edce0fa6436537b4a9afe88-json.log
89G total
That 89Gb giant is just filling up with 100s of messages per second of this kind:
{"log":"{\"level\":\"warn\",\"time\":1715349912821,\"pid\":144,\"hostname\":\"c1db5eeac69c\",\"logContext\":{\"sessionId\":\"018f61ef-b287-795d-94b4-911b52b78104\",\"partition\":0,\"teamId\":1,\"topic\":\"session_recording_snapshot_item_events\",\"oldestKafkaTimestamp\":null,\"bufferCount\":0,\"referenceTime\":1715349912757,\"referenceTimeHumanReadable\":\"2024-05-10T14:05:12.757+00:00\",\"flushThresholdMs\":600000,\"flushThresholdJitteredMs\":488701.8770446094,\"flushThresholdMemoryMs\":586442.2524535313},\"msg\":\"[MAIN] [session-manager] buffer has no oldestKafkaTimestamp yet\"}\n","stream":"stdout","time":"2024-05-10T14:05:34.679816487Z"}
wtf??
Thanks for suggestion, I've limited logfiles to 10mb in docker-compose.yml
and now it's sorta under control, but hell, there should be a way to set log level...
plugins:
extends:
file: docker-compose.base.yml
service: plugins
image: posthog/posthog:f1d32e6969f531577b32411e985d007f821643f6
environment:
logging:
options:
max-size: 10m
All the detail from folks is amazing here... one of the hardest things in understanding issues with self-hosted deployments is the variability in deployments means gathering info is super difficult so all this upfront detail is amazing.
I can believe that the elastic deployment of PostHog that we have could hide something that you all are experiencing
[session-manager] buffer has no oldestKafkaTimestamp yet
this is an unexpected condition
we receive a recording event
that might mean we create a new session manager for that session
then we add the event to the session manager
use the timestamp from that event to set oldestKafkaTimestamp
https://github.com/PostHog/posthog/blob/703a4ece7c994582202c356f804d4d376db0844e/plugin-server/src/main/ingestion-queues/session-recording/services/session-manager.ts#L169-L178
i think that logically the presence of this log means either we're trapped in destroying
state for a recording that's receiving traffic or your events don't have timestamps 🤯
does session replay work on these deployments?
for the logging volume
we use the pino log library in the plugin server...
you can set log level using the LOG_LEVEL environment variable
https://github.com/PostHog/posthog/blob/37a08e808ca198f2e26916bf9294069f6080819f/plugin-server/README.md?plain=1#L133
with supported values here https://github.com/PostHog/posthog/blob/224a5d5d0c07f880b19dbc02cce2f07b965023c0/plugin-server/src/types.ts#L39-L46
the default level if not overridden in the environment is info
and we can do this to turn down the amount of logging anyway https://github.com/PostHog/posthog/pull/22251
i think that logically the presence of this log means either we're trapped in
destroying
state for a recording that's receiving traffic or your events don't have timestamps 🤯does session replay work on these deployments?
This is interesting, as session replay worked upon initial installation, but stopped working later today. Playback won't start with "Buffering" message, and console shows error related to scenes.session-recordings.root
missing from the store. Same as described here:
https://posthog.com/questions/video-not-playing
So I think it's a common issue at least with self-hosted. I will look into it tomorrow, will open another issue.
you can set log level using the LOG_LEVEL environment variable
https://github.com/PostHog/posthog/blob/37a08e808ca198f2e26916bf9294069f6080819f/plugin-server/README.md?plain=1#L133
with supported values here
https://github.com/PostHog/posthog/blob/224a5d5d0c07f880b19dbc02cce2f07b965023c0/plugin-server/src/types.ts#L39-L46
the default level if not overridden in the environment is info
Thanks for mentioning this! It's a very important setting, default should be 'error' I think... either way, worth adding to https://posthog.com/docs/self-host/configure/environment-variables
will open another issue.
feel free to keep it here if it seems related.
thanks for taking the time 🥇
Another problem is Kafka logging:
Every 60.0s: du -h --max-depth=1 /var/lib/docker/overlay2 | sort -rh | head -15 t.xfeed.com: Mon May 13 02:15:13 2024
238G /var/lib/docker/overlay2
207G /var/lib/docker/overlay2/8f83ff9dce79104e554c7e76c7805cd77af31cd15eb183f1ac6a518dadfaa389
5.0G /var/lib/docker/overlay2/f80e07831c327082c849d0839efadda5d280e1858b782028594347aeec75b7d7
du -hsc /var/lib/docker/overlay2/8f83ff9dce79104e554c7e76c7805cd77af31cd15eb183f1ac6a518dadfaa389/diff/bitnami/kafka/data/session_recording_snapshot_item_events-0
100G session_recording_snapshot_item_events-0
(everything doubles in size due to overlay2 diffs)
➜ data ll session_recording_snapshot_item_events-0 | more
total 100G
-rw-r--r-- 1 cook root 83K May 11 11:42 00000000000000000000.index
-rw-r--r-- 1 cook root 1.0G May 11 11:42 00000000000000000000.log
-rw-r--r-- 1 cook root 118K May 11 11:42 00000000000000000000.timeindex
-rw-r--r-- 1 cook root 89K May 11 12:04 00000000000000012626.index
-rw-r--r-- 1 cook root 1.0G May 11 12:04 00000000000000012626.log
-rw-r--r-- 1 cook root 10 May 11 11:42 00000000000000012626.snapshot
-rw-r--r-- 1 cook root 126K May 11 12:04 00000000000000012626.timeindex
-rw-r--r-- 1 cook root 88K May 11 12:26 00000000000000025804.index
-rw-r--r-- 1 cook root 1.0G May 11 12:26 00000000000000025804.log
-rw-r--r-- 1 cook root 10 May 11 12:04 00000000000000025804.snapshot
-rw-r--r-- 1 cook root 126K May 11 12:26 00000000000000025804.timeindex
-rw-r--r-- 1 cook root 80K May 11 12:51 00000000000000039145.index
[...]
Will figure how to take it under control and create a PR later.
I think this can be closed, as setting LOG_LEVEL
, limiting logging and adusting Kafka log retention in docker-compose.yml
completely fixed this for us: now we accumulate just couple of gigs of data daily, most of which in session recordings in MinIO, i.e. can be either offloaded to S3 or would be automatically wiped after 30 days, so space usage is under control now.
I'll close it... folk are welcome to re-open or open a follow-up issue with more details if the fixes here don't work for you all