bolt
bolt copied to clipboard
BoltDB Performance example?
I'm trying to do a simple test of boltdb's performance. I ran batches of 100,000 inserts up to 100 million records. Then I tried to read back all 100 million records from that bucket. I tested on a c1.xlarge with an EBS SSD backed data volume. I'm pretty pleased with the results even on that older instance type. I made sure to write enough data so it wasn't all able to be stored in RAM to make sure the disk was taxed properly.
Inserts took 37m 51s at 44,000 writes per second Reads took 26m 28s at 62,000 reads per second File on disk was 31GB
Code was: https://gist.github.com/jiminoc/e2c772e9eafce2901364
The question is: Are there any low hanging fruit knobs I can turn to increase performance? Are there any parts of the code that I shouldn't be doing? (e.g. doing the 100K batches opening/closing the DB file)
thanks! Jim
@jiminoc If you make the key interested sequentially, it can improve the performance significantly.
https://gist.github.com/jiminoc/e2c772e9eafce2901364#file-boltdb-go-L33. Print id as its byte representation, not in human readable string.
@jiminoc As @xiang90 said, sequential writes are significantly faster. Also, shorter keys are more efficient. If you made a bucket with the key:
6a204bd89f3c8348afd5c77c717a097a:details:2413fb3709b05939f04cf2e92f7d0897fc2596f9ad0b8a9ea855c7bfebaae892
and then inserted the ids as keys as big endian byte slices then you'd get much better performance:
import "encoding/binary"
...
func getKey(id int) []byte {
key := make([]byte, 8)
binary.BigEndian.PutUint64(uint64(id), key)
return key
}
That would make the keys so that they're inserted in sequential order.
If you're always going to append keys (and never insert randomly) then you can also set Bucket.FillPercent = 1.0. Although it's easy to shoot yourself in the foot with that if you're not sure if you're going to randomly insert at some point in the future.
I'm surprised your reads are so slow since you're doing a range scan. I would expect it to be in the millions of keys per second.
@benbjohnson when the db file is able to be served from memory I was getting 2.5 million reads per second at 10million items (3.1GB file) I was getting that, then I went to 100 million and it went down to the 62K.
Why do you open and close database on every read and insert? Why do you not use Batch with parallel insertions?
and then inserted the ids as keys as big endian byte slices then you'd get much better performance:
import "encoding/binary"
...
func getKey(id int) []byte {
key := make([]byte, 8)
binary.BigEndian.PutUint64(uint64(id), key)
return key
}
@benbjohnson I'm using time.UnixNano() int64 as keys using the method you describe, but I went with LittleEndian, so BigEndian is the right way to make sequential byte[] keys in Bolt?