bolt
bolt copied to clipboard
Question: Shrink BoltDB File?
We have a database that has grown to 4G. We have a lot of duplicated data that is going to get removed. Is there anyway to compact the boltdb file once the data is removed?
Unfortunately there's not a built-in way to compact a database. You can write a simple program to iterate over buckets from a source database and copy them over to a new database and that'll do the same thing as compacting.
I've found that simply using tx.WriteTo to back up certain DBs dramatically reduces the size (while obviously keeping the live data).
I don't have any useful metrics/benchmarks, but I could try to put one together and see if it is an actual benefit.
What kind of duplication and scale are we looking at here? Values that were large are completely removed, shrunk, etc?
@benbjohnson is this a feature you're interested in bolt having?
@cpalone It'd be nice to do a recursive copy of every bucket instead of simply using Tx.WriteTo. That would eliminate an excessive free pages. You can do a simple version where it happens in one transaction but a more efficient implementation would be to batch it at 1000 key/values at a time. That gets a bit more complicated though.
I can't give you a timeline on this but I'll give the batch implementation a shot!
I would like to have this feature as well. it will be nice to be able to shrink the DB in-place while it is still in use.
Unfortunately, it looks like there is no way to find a path to arbitrary page on a disk quickly. Without this ability it is O(n^2) to compact DB in-place. Am I right?
@funny-falcon That's correct. Given a page ID there's no way to find it's parents without looking through all the branches. Although branch data tends to be relatively small. You could work your way from the end of the database towards the beginning and reallocate pages and their parents to essentially do a defragmentation. Then truncate the database at the highest unfreed page and adjust the freelist.
However, rewriting the data file to a new data file may very well be faster because of the time it takes to find the parents.
I'm now in need for shrinking my bolt db file although happy to do this offline. Was thinking of creating a CLI tool for this, but what about adding something like bolt rewrite to the existing bolt CLI tool?
The goal is to add a bolt shrink command to the CLI tool but I haven't had the time so far. Feel free to open a pull request for it though!
There is #460 doing this job.
@vincent-petithory Ah, I'm sorry! I'll take a look and review it. I'm sorry I missed it.
Ah, cool, this will definitely save me some time, especially that it turned out this is not so trivial for larger DBs as we need to commit transactions periodically.
Also just realised bolt.Bucket has some sequencing feature available via Bucket.NextSequence(). Not sure this is an issue, but there is no interface for accessing/setting this value hence simple key-value rewriting will loose the sequence number. I guess if we want to preserve these we would need to extend Bucket interface though.