cube icon indicating copy to clipboard operation
cube copied to clipboard

Cubestore Router space is growing continuously due to temp-uploads

Open viktordebulat opened this issue 11 months ago • 13 comments

Problem

We're using self-hosted cubestore 1 router + 1 worker for preaggregations in production mode. There is also Cubejs API and refresh worker. Space on router is growing due to temp-uploads directory which seems never purges. I see csv.gz files from day 0 and even files for preaggregations which don't exist anymore.

Image

For storage we're using self-managed S3 compatible storage (similar to Minio) in the same k8s cluster as Cube.

Last versions tried - v1.1.0 and v1.1.15

Config or router is simple and uses standard env variables approach: CUBESTORE_SERVER_NAME, CUBESTORE_WORKERS, CUBESTORE_META_PORT, CUBESTORE_MINIO_SERVER_ENDPOINT, CUBESTORE_MINIO_BUCKET, CUBESTORE_MINIO_ACCESS_KEY_ID, CUBESTORE_MINIO_SECRET_ACCESS_KEY

My questions are:

  • Any clue, why old and even non-existing preaggregations files are not deleted from temp-uploads on cubestore router? Maybe there is some config missed?
  • How does cubestore local storage housekeeping works, could it be configured somehow or should be managed by some external cronjobs?
  • Could it be an issue when from time too time we get in cubestore router logs something like CubeError { message: "File sizes for cachestore-current doesn't match after upload. Expected to be 24 but 0 uploaded", backtrace: "", cause: Internal } for metastore-* and cachestore-current? Files are present on S3 storage, it's just might be a short time lag after upload to make that file availabe due to metadata for files not added yet. Does it retry on checking file size in case of error or not?

viktordebulat avatar Jan 19 '25 14:01 viktordebulat

We are facing similar issues with growing space consumption in our router container.

Some more insight into how cachestore, metastore and temp-files are handled would be much appreciated. In what ways can we configure cube in this regard?

allekai avatar Jan 22 '25 18:01 allekai

Okay, we stopped changing schema and pre-aggregations for a while and it router space has stopped growing:

Image

At the same time worker space decreased:

Image

Some housekeeping under the hood, I assume. As S3 volume space also decrease symmetrically with worker:

Image

So, need to understand why temp files are not deleted from router (again, there are couple related to non existing preaggregations).

viktordebulat avatar Jan 23 '25 07:01 viktordebulat

Made manual cleanup for old temp-uploads dir files. Will observe if it will grow up so fast again. But looks like housekeeping is working only for worker.

viktordebulat avatar Jan 27 '25 07:01 viktordebulat

Hey guys! I'm having the same issue, even with a test DB that only has a few dozen rows the space taken in my storage keeps growing to 100s of MB per day! Can you please share more details on how you got it to stop growing? @igorlukanin For visibility

LeftoversTodayAppAdmin avatar Feb 17 '25 05:02 LeftoversTodayAppAdmin

We decreased amount of cubes and preaggregations for now. Worker makes some cleanups but router doesn't. So, had to cleanup old temp files manually inside pod with bash script for now

viktordebulat avatar Feb 20 '25 11:02 viktordebulat

Than you @viktordebulat, I have reduced the number of pre-aggregations by removing all the ones I was not yet using. I hope the Cube team puts out an update to clean up more aggressively or provide env variables to set the TTL ourselves.

LeftoversTodayAppAdmin avatar Feb 21 '25 01:02 LeftoversTodayAppAdmin

Hi folks. We're experiencing a similar issue.

is there a timeline for the fix? Thank you.

ghalieshredacre2 avatar Mar 26 '25 13:03 ghalieshredacre2

Hello Team, we are also facing the same issue, the cubectore router space has been increased to 12gb in span of 10-15 min, and due to high volume the pod has been restating. can some one tell me how you guys solved the issue.

Madhava-Marri avatar Apr 23 '25 02:04 Madhava-Marri

can some one tell me how you guys solved the issue.

Decreased amount of aggregations in schema, cleaned up temp files on cubestore manually, never appeared again. Probably some job stuck due to some bug which led to increasing space. We had some aggregation jobs failing due to lack of source DB resources. Also reduced concurrency.

viktordebulat avatar Apr 23 '25 20:04 viktordebulat

Hi Team, the issue is with the s3 bucket, we are using cache from s3 and that s3 has large amount of data in it. so cache is piling up and the spike is happening. we changed the s3 bucket and issue is no more.

Madhava-Marri avatar Apr 24 '25 01:04 Madhava-Marri

We decreased amount of cubes and preaggregations for now. Worker makes some cleanups but router doesn't. So, had to cleanup old temp files manually inside pod with bash script for now

Can you please help with similiar script?

arkapravasinha avatar Sep 01 '25 17:09 arkapravasinha

# du -sh /cube/.cubestore/data
35G     /cube/.cubestore/data

# cd /cube/.cubestore/data
# du -sh *
...
17M     metastore-1758693449652
17M     metastore-1758693751657
17M     metastore-1758694053879
17M     metastore-1758694355999
17M     metastore-1758694658056
17M     metastore-1758694960127
17M     metastore-1758695262209

# ls -l metastore-1758695262209
total 17092
-rw-r--r--    1 root root 17477190 Sep 24 06:27 000008.log
-rw-r--r-- 3667 root root     2455 Sep 11 11:48 000009.sst
-rw-r--r--    1 root root       16 Sep 24 06:27 CURRENT
-rw-r--r--    1 root root      178 Sep 24 06:27 MANIFEST-000005
-rw-r--r-- 3667 root root     6918 Sep 11 11:43 OPTIONS-000007

kong62 avatar Sep 24 '25 06:09 kong62

can some one tell me how you guys solved the issue.

Decreased amount of aggregations in schema, cleaned up temp files on cubestore manually, never appeared again. Probably some job stuck due to some bug which led to increasing space. We had some aggregation jobs failing due to lack of source DB resources. Also reduced concurrency.

Can you share the cleaning job script? I am facing the similar issue.

Naman-2001 avatar Sep 30 '25 09:09 Naman-2001