convex-backend
convex-backend copied to clipboard
logs and file are not being deleted on self-hosted
Actual blob files are not deleted from storage/files Also with a lot of updates db grows out of control, even deleting all tables, system tables take gigabytes.
Blob files are not deleted from disk. File storage deletes are treated as soft deletes. If you'd like hard deletes - that's something you have to handle yourself - we recommend doing it asynchronously. With the default self-hosted configuration, it'll be in a folder on disk. If you set up S3 storage - it'll be in S3.
As for system tables - there's a system table cleanup worker here that cleans up session requests. https://github.com/get-convex/convex-backend/blob/main/crates/application/src/system_table_cleanup/mod.rs#L143
By default - it's configured to keep 2 weeks of mutations - but you can override with an environment variable. https://github.com/get-convex/convex-backend/blob/main/crates/common/src/knobs.rs#L560
Blob files are not deleted from disk. File storage deletes are treated as soft deletes. If you'd like hard deletes - that's something you have to handle yourself - we recommend doing it asynchronously. With the default self-hosted configuration, it'll be in a folder on disk. If you set up S3 storage - it'll be in S3.
As for system tables - there's a system table cleanup worker here that cleans up session requests. https://github.com/get-convex/convex-backend/blob/main/crates/application/src/system_table_cleanup/mod.rs#L143
By default - it's configured to keep 2 weeks of mutations - but you can override with an environment variable. https://github.com/get-convex/convex-backend/blob/main/crates/common/src/knobs.rs#L560
Thanks you very much. Could you please share the best practice of deleting blob files, as their name is not the same as in geturl or get in StorageAction writer. What would be the best way to understand what file to delete? I implemented comparing sha256, but it seems unoptimal. Maybe i somehow can get their name on creation (upload) and store in the table? Anyway thanks you so much again!
admittedly - it's not very easy to correlate between the storage_id and storage_key https://github.com/get-convex/convex/blob/main/crates/model/src/file_storage/types.rs#L23
There's a _file_storage system table that has the correlation, but it's not very easily accessible.
The inefficient sha256 might be the workaround for now for you.
Eventually there's a couple things I want us to implement on the Convex side
- A nice admin tool to print system tables
- Some automated tool/script for cleaning up old soft deleted storage data Both of these could be written in rust using helpers from the codebase, but neither exist yet. We have in internal version of the first one, but it's hard to open source. I'll see what I can do.
I'm starting to encounter the same issues as I started to test my app against some light traffic (image uploading is a core part of it - specifically for the purposes of figuring out which images to delete vs keep). so for my use-case, i expected / am experiencing a lot of "stale files".
if the automated cleanup of soft deletes is not on the roadmap, I'm going to need to figure out some sort of solution in the interim, but to your point it's difficult to connect the .blob file to the storageId. I'd appreciate any guidance on the current state of "best practice" for ensuring self-hosting doesn't just grow in storage.
thank you for your time and consideration.
Right now, current best option is to either use creation time or sha256 to correlate the files that you list out to the blob files on disk.
Another option is to use a component like the R2 component (or make your own) to use with a third party file storage platform.
https://www.convex.dev/components/cloudflare-r2
thank you! I tried out the R2 component and coded in eager deletions.
That worked beautifully, and the migrations component has been helpful in switching storage without breaking links.