unison icon indicating copy to clipboard operation
unison copied to clipboard

Pulling bloats sqlite db sizes

Open ChrisPenner opened this issue 3 years ago • 1 comments

Looks like most of the temp entities we create stick around in the DB after pulling (which is just how sqlite works), but it's bloating file-sizes a lot.

I haven't checked, and it's unlikely, but maybe this is also increasing memory usage, since I know sqlite will keep caches in memory.

  • Start with an empty codebase
  • pull unison.public.base base
  • Wait
$ du -sch unison.sqlite3
 160M    unison.sqlite3
 160M    total

$ sqlite3 unison.sqlite3                                                                                                                                                                                                 
SQLite version 3.32.3 2020-06-18 14:16:19
Enter ".help" for usage hints.
sqlite> vacuum;

$ du -sch unison.sqlite3                                                                                                                                                                                                
 28M    unison.sqlite3
 28M    total

We should likely just VACUUM after each pull.

ChrisPenner avatar Aug 02 '22 20:08 ChrisPenner

Vacuuming after each pull sounds fine to me. That said, I don't think the size of the unused space would continue to grow monotonically with each pull. It's more like some constant space overhead associated with our usage pattern of deleting lots of rows.

mitchellwrosen avatar Aug 02 '22 20:08 mitchellwrosen

Q: What about for git syncs? A: Not relevant because the only deletions we have currently are the temp_entity rows, which don't apply to repos in git.

aryairani avatar Aug 10 '22 19:08 aryairani

I just learned a couple things from https://www.sqlite.org/lang_vacuum.html

A VACUUM will fail if there is an open transaction on the database connection that is attempting to run the VACUUM.

So far, so good. We can just retry in a loop like Transaction does on SQLITE_BUSY. This would be a new retry-loop, though, because you can't do the VACUUM itself from within a transaction, either.

Unfinalized SQL statements typically hold a read transaction open, so the VACUUM might fail if there are unfinalized SQL statements on the same connection.

This one is a bit concerning... were we to implement a prepared statement cache, those unfinalized statements "might" cause a vacuum to fail? :sweat:.

I don't think we'd want to finalize all of the statements in such a cache just to perform a vacuum. I mean, we might, but this extra disk space isn't hurting much.

A variant of this ticket - maybe we could just vacuum on startup of ucm?

mitchellwrosen avatar Aug 11 '22 15:08 mitchellwrosen

Oh, I'm not sure if the documentation is simply referring to statements that have sent a BEGIN but not a COMMIT yet. If so, well, that's expected. But initially I interpreted it to mean statements that have been prepared but not finalized -- and SQLite is mysteriously internally holding a read transaction open for such statements? That can't be right...

mitchellwrosen avatar Aug 11 '22 15:08 mitchellwrosen

I experimented a bit and determined the mere existence of unfinalized statements actually doesn't make VACUUM fail.

mitchellwrosen avatar Aug 11 '22 16:08 mitchellwrosen