dqlite icon indicating copy to clipboard operation
dqlite copied to clipboard

Consider periodically using VACUUM to reduce memory footprint

Open freeekanayaka opened this issue 1 year ago • 9 comments

I've experimented using VACUUM against a real-world medium-size Incus/LXD database (~700 containers) that had a size of 41 Megabytes. After running the VACUUM command, the size of the database was down to 3 Megabytes.

Depending on the workflow, if there are large databases that create issues with memory/snapshotting, this might be a relatively cheap and effective measure to explore. It might not have such a dramatic effect in all cases though.

freeekanayaka avatar Aug 17 '23 23:08 freeekanayaka

The database I mentioned had been running for several years, most probably performing a number of INSERTs and DELETEs over the course of time, which presumably slowly increased database fragmentation more and more. That means that running VACUUM regularly is advisable, especially for workloads with a lot of data updates.

I also tried to run .dump on that database and create it from scratch with the same data, and I got the same result: database size went from 41 MB to 3 MB, as it would be expected.

freeekanayaka avatar Aug 18 '23 14:08 freeekanayaka

Haven't looked into it yet, but weren't there issues with running VACUUM on a dqlite database, did you have to adapt some things?

MathieuBordere avatar Aug 18 '23 14:08 MathieuBordere

The VACUUM command works by copying the contents of the database into a temporary database file and then overwriting the original with the contents of the temporary file. When overwriting the original, a rollback journal or [write-ahead log](https://www.sqlite.org/wal.html) WAL file is used just as it would be for any other database transaction.

I'm just wondering how it would work in dqlite's case

MathieuBordere avatar Aug 18 '23 14:08 MathieuBordere

Yeah, I looked into using VACUUM to reset/empty a database in #435 and concluded that a prerequisite was proper support for attaching additional databases to a dqlite-managed connection, see https://github.com/canonical/dqlite/pull/435#issuecomment-1331257989. But I could certainly be missing something.

cole-miller avatar Aug 18 '23 14:08 cole-miller

I can think of several design options. But in my mind it's most probably something that's needed for long-running deployments.

freeekanayaka avatar Aug 18 '23 15:08 freeekanayaka

I can think of several design options. But in my mind it's most probably something that's needed for long-running deployments.

yes agree that it should be scheduled regularly when e.g. # free pages / # total pages passes a certain threshold (just thinking out loud)

MathieuBordere avatar Aug 18 '23 16:08 MathieuBordere

If you are looking for something cheap, this could initially be implemented "offline", e.g. when the process restarts.

freeekanayaka avatar Aug 18 '23 17:08 freeekanayaka

VACUUM needs to be coordinated using a Raft log entry, since it affects page numbers in the future.

cole-miller avatar Dec 04 '23 20:12 cole-miller

VACUUM needs to be coordinated using a Raft log entry, since it affects page numbers in the future.

Sounds like a good point, thanks.

freeekanayaka avatar Dec 06 '23 09:12 freeekanayaka