unison
unison copied to clipboard
Pulling bloats sqlite db sizes
Looks like most of the temp entities we create stick around in the DB after pulling (which is just how sqlite works), but it's bloating file-sizes a lot.
I haven't checked, and it's unlikely, but maybe this is also increasing memory usage, since I know sqlite will keep caches in memory.
- Start with an empty codebase
pull unison.public.base base- Wait
$ du -sch unison.sqlite3
160M unison.sqlite3
160M total
$ sqlite3 unison.sqlite3
SQLite version 3.32.3 2020-06-18 14:16:19
Enter ".help" for usage hints.
sqlite> vacuum;
$ du -sch unison.sqlite3
28M unison.sqlite3
28M total
We should likely just VACUUM after each pull.
Vacuuming after each pull sounds fine to me. That said, I don't think the size of the unused space would continue to grow monotonically with each pull. It's more like some constant space overhead associated with our usage pattern of deleting lots of rows.
Q: What about for git syncs? A: Not relevant because the only deletions we have currently are the temp_entity rows, which don't apply to repos in git.
I just learned a couple things from https://www.sqlite.org/lang_vacuum.html
A VACUUM will fail if there is an open transaction on the database connection that is attempting to run the VACUUM.
So far, so good. We can just retry in a loop like Transaction does on SQLITE_BUSY. This would be a new retry-loop, though, because you can't do the VACUUM itself from within a transaction, either.
Unfinalized SQL statements typically hold a read transaction open, so the VACUUM might fail if there are unfinalized SQL statements on the same connection.
This one is a bit concerning... were we to implement a prepared statement cache, those unfinalized statements "might" cause a vacuum to fail? :sweat:.
I don't think we'd want to finalize all of the statements in such a cache just to perform a vacuum. I mean, we might, but this extra disk space isn't hurting much.
A variant of this ticket - maybe we could just vacuum on startup of ucm?
Oh, I'm not sure if the documentation is simply referring to statements that have sent a BEGIN but not a COMMIT yet. If so, well, that's expected. But initially I interpreted it to mean statements that have been prepared but not finalized -- and SQLite is mysteriously internally holding a read transaction open for such statements? That can't be right...
I experimented a bit and determined the mere existence of unfinalized statements actually doesn't make VACUUM fail.