convex-backend icon indicating copy to clipboard operation
convex-backend copied to clipboard

logs and file are not being deleted on self-hosted

Open darkole opened this issue 6 months ago • 6 comments

Actual blob files are not deleted from storage/files Also with a lot of updates db grows out of control, even deleting all tables, system tables take gigabytes.

darkole avatar May 18 '25 20:05 darkole

Blob files are not deleted from disk. File storage deletes are treated as soft deletes. If you'd like hard deletes - that's something you have to handle yourself - we recommend doing it asynchronously. With the default self-hosted configuration, it'll be in a folder on disk. If you set up S3 storage - it'll be in S3.

As for system tables - there's a system table cleanup worker here that cleans up session requests. https://github.com/get-convex/convex-backend/blob/main/crates/application/src/system_table_cleanup/mod.rs#L143

By default - it's configured to keep 2 weeks of mutations - but you can override with an environment variable. https://github.com/get-convex/convex-backend/blob/main/crates/common/src/knobs.rs#L560

nipunn1313 avatar May 18 '25 21:05 nipunn1313

Blob files are not deleted from disk. File storage deletes are treated as soft deletes. If you'd like hard deletes - that's something you have to handle yourself - we recommend doing it asynchronously. With the default self-hosted configuration, it'll be in a folder on disk. If you set up S3 storage - it'll be in S3.

As for system tables - there's a system table cleanup worker here that cleans up session requests. https://github.com/get-convex/convex-backend/blob/main/crates/application/src/system_table_cleanup/mod.rs#L143

By default - it's configured to keep 2 weeks of mutations - but you can override with an environment variable. https://github.com/get-convex/convex-backend/blob/main/crates/common/src/knobs.rs#L560

Thanks you very much. Could you please share the best practice of deleting blob files, as their name is not the same as in geturl or get in StorageAction writer. What would be the best way to understand what file to delete? I implemented comparing sha256, but it seems unoptimal. Maybe i somehow can get their name on creation (upload) and store in the table? Anyway thanks you so much again!

darkole avatar May 18 '25 22:05 darkole

admittedly - it's not very easy to correlate between the storage_id and storage_key https://github.com/get-convex/convex/blob/main/crates/model/src/file_storage/types.rs#L23

There's a _file_storage system table that has the correlation, but it's not very easily accessible.

The inefficient sha256 might be the workaround for now for you.

Eventually there's a couple things I want us to implement on the Convex side

  • A nice admin tool to print system tables
  • Some automated tool/script for cleaning up old soft deleted storage data Both of these could be written in rust using helpers from the codebase, but neither exist yet. We have in internal version of the first one, but it's hard to open source. I'll see what I can do.

nipunn1313 avatar May 21 '25 04:05 nipunn1313

I'm starting to encounter the same issues as I started to test my app against some light traffic (image uploading is a core part of it - specifically for the purposes of figuring out which images to delete vs keep). so for my use-case, i expected / am experiencing a lot of "stale files".

if the automated cleanup of soft deletes is not on the roadmap, I'm going to need to figure out some sort of solution in the interim, but to your point it's difficult to connect the .blob file to the storageId. I'd appreciate any guidance on the current state of "best practice" for ensuring self-hosting doesn't just grow in storage.

thank you for your time and consideration.

mathematicalmichael avatar Jul 05 '25 07:07 mathematicalmichael

Right now, current best option is to either use creation time or sha256 to correlate the files that you list out to the blob files on disk.

Another option is to use a component like the R2 component (or make your own) to use with a third party file storage platform.

https://www.convex.dev/components/cloudflare-r2

nipunn1313 avatar Jul 07 '25 23:07 nipunn1313

thank you! I tried out the R2 component and coded in eager deletions.

That worked beautifully, and the migrations component has been helpful in switching storage without breaking links.

mathematicalmichael avatar Jul 10 '25 14:07 mathematicalmichael