bolt icon indicating copy to clipboard operation
bolt copied to clipboard

Question: Shrink BoltDB File?

Open kylebrandt opened this issue 8 years ago • 13 comments

We have a database that has grown to 4G. We have a lot of duplicated data that is going to get removed. Is there anyway to compact the boltdb file once the data is removed?

kylebrandt avatar Sep 15 '15 20:09 kylebrandt

Unfortunately there's not a built-in way to compact a database. You can write a simple program to iterate over buckets from a source database and copy them over to a new database and that'll do the same thing as compacting.

benbjohnson avatar Sep 20 '15 19:09 benbjohnson

I've found that simply using tx.WriteTo to back up certain DBs dramatically reduces the size (while obviously keeping the live data).

I don't have any useful metrics/benchmarks, but I could try to put one together and see if it is an actual benefit.

What kind of duplication and scale are we looking at here? Values that were large are completely removed, shrunk, etc?

chrsm avatar Sep 24 '15 17:09 chrsm

@benbjohnson is this a feature you're interested in bolt having?

ckingdev avatar Oct 27 '15 21:10 ckingdev

@cpalone It'd be nice to do a recursive copy of every bucket instead of simply using Tx.WriteTo. That would eliminate an excessive free pages. You can do a simple version where it happens in one transaction but a more efficient implementation would be to batch it at 1000 key/values at a time. That gets a bit more complicated though.

benbjohnson avatar Oct 28 '15 18:10 benbjohnson

I can't give you a timeline on this but I'll give the batch implementation a shot!

ckingdev avatar Oct 29 '15 23:10 ckingdev

I would like to have this feature as well. it will be nice to be able to shrink the DB in-place while it is still in use.

chnrxn avatar Nov 19 '15 07:11 chnrxn

Unfortunately, it looks like there is no way to find a path to arbitrary page on a disk quickly. Without this ability it is O(n^2) to compact DB in-place. Am I right?

funny-falcon avatar Apr 22 '16 18:04 funny-falcon

@funny-falcon That's correct. Given a page ID there's no way to find it's parents without looking through all the branches. Although branch data tends to be relatively small. You could work your way from the end of the database towards the beginning and reallocate pages and their parents to essentially do a defragmentation. Then truncate the database at the highest unfreed page and adjust the freelist.

However, rewriting the data file to a new data file may very well be faster because of the time it takes to find the parents.

benbjohnson avatar Apr 26 '16 15:04 benbjohnson

I'm now in need for shrinking my bolt db file although happy to do this offline. Was thinking of creating a CLI tool for this, but what about adding something like bolt rewrite to the existing bolt CLI tool?

tg avatar Jul 26 '16 09:07 tg

The goal is to add a bolt shrink command to the CLI tool but I haven't had the time so far. Feel free to open a pull request for it though!

benbjohnson avatar Jul 27 '16 17:07 benbjohnson

There is #460 doing this job.

vincent-petithory avatar Jul 27 '16 17:07 vincent-petithory

@vincent-petithory Ah, I'm sorry! I'll take a look and review it. I'm sorry I missed it.

benbjohnson avatar Jul 27 '16 17:07 benbjohnson

Ah, cool, this will definitely save me some time, especially that it turned out this is not so trivial for larger DBs as we need to commit transactions periodically.

Also just realised bolt.Bucket has some sequencing feature available via Bucket.NextSequence(). Not sure this is an issue, but there is no interface for accessing/setting this value hence simple key-value rewriting will loose the sequence number. I guess if we want to preserve these we would need to extend Bucket interface though.

tg avatar Jul 27 '16 18:07 tg