dqlite
dqlite copied to clipboard
Consider periodically using VACUUM to reduce memory footprint
I've experimented using VACUUM
against a real-world medium-size Incus/LXD database (~700 containers) that had a size of 41 Megabytes. After running the VACUUM
command, the size of the database was down to 3 Megabytes.
Depending on the workflow, if there are large databases that create issues with memory/snapshotting, this might be a relatively cheap and effective measure to explore. It might not have such a dramatic effect in all cases though.
The database I mentioned had been running for several years, most probably performing a number of INSERTs and DELETEs over the course of time, which presumably slowly increased database fragmentation more and more. That means that running VACUUM regularly is advisable, especially for workloads with a lot of data updates.
I also tried to run .dump
on that database and create it from scratch with the same data, and I got the same result: database size went from 41 MB to 3 MB, as it would be expected.
Haven't looked into it yet, but weren't there issues with running VACUUM
on a dqlite database, did you have to adapt some things?
The VACUUM command works by copying the contents of the database into a temporary database file and then overwriting the original with the contents of the temporary file. When overwriting the original, a rollback journal or [write-ahead log](https://www.sqlite.org/wal.html) WAL file is used just as it would be for any other database transaction.
I'm just wondering how it would work in dqlite's case
Yeah, I looked into using VACUUM to reset/empty a database in #435 and concluded that a prerequisite was proper support for attaching additional databases to a dqlite-managed connection, see https://github.com/canonical/dqlite/pull/435#issuecomment-1331257989. But I could certainly be missing something.
I can think of several design options. But in my mind it's most probably something that's needed for long-running deployments.
I can think of several design options. But in my mind it's most probably something that's needed for long-running deployments.
yes agree that it should be scheduled regularly when e.g. # free pages / # total pages
passes a certain threshold (just thinking out loud)
If you are looking for something cheap, this could initially be implemented "offline", e.g. when the process restarts.
VACUUM needs to be coordinated using a Raft log entry, since it affects page numbers in the future.
VACUUM needs to be coordinated using a Raft log entry, since it affects page numbers in the future.
Sounds like a good point, thanks.