flow-go
flow-go copied to clipboard
[Database] Compress values stored in badgerDB
Close https://github.com/onflow/flow-go/issues/5402
Summary
This PR uses snappy to compress the data stored in badgerDB. Benchmark shows a significant save on storage especially for chunk data pack storage, which takes more than 90% of disk usage.
First attempt
Initially I tried the Snappy options from badger, however, the benchmark shows no difference, the data is still uncompressed.
Second attempt
In order to compress the data, I need to manually compress the value stored in badger. I chose the Snappy algorithm, as it doesn't sacrifices the speed too much and still have a good compression rate.
Benchmark Storage
I ran the localnet instance with benchmark tool sending transactions for 5 mins. The result shows the protocol data is reduced 10%, and chunk data pack is reduced 65%:
# Without compression
execution protocol data: 4.4MB
execution chunk data pack: 124MB
# With compression
execution protocol data: 4MB
execution chunk data pack data: 43MB
Benchmark Speed of pure encoding and decoding
Pure encoding is 28% slower, Pure decoding is 3.6 times slower:
# encoding without compression
BenchmarkEncodeWithoutCompress-10 715384 1671 ns/op 2297 B/op 17 allocs/op
# encoding with compression
BenchmarkEncodeAndCompress-10 520813 2281 ns/op 3193 B/op 18 allocs/op
# decoding without compression
BenchmarkDecodeWithoutUncompress-10 6552171 168.9 ns/op 224 B/op 3 allocs/op
# decoding with compression
BenchmarkDecodeUncompressed-10 1981490 598.3 ns/op 1123 B/op 7 allocs/op
Benchmark Speed of round trip saving and reading from database
It shows 3% slowness in writes, and 5% slowness in reads (but also a 55% reduction in memory usage in reads)
# encoding without compression and saving to database
BenchmarkReadResult-10 479792 2156 ns/op 2212 B/op 7 allocs/op
# encoding with compression and saving to database
BenchmarkReadResult-10 499564 2230 ns/op 985 B/op 6 allocs/op
# reading from database and decoding without compression
BenchmarkSaveResult-10 22549 51149 ns/op 6604 B/op 80 allocs/op
# reading from database and decoding with compression
BenchmarkSaveResult-10 23582 48175 ns/op 8501 B/op 82 allocs/op
Codecov Report
Attention: Patch coverage is 75.00000% with 18 lines in your changes are missing coverage. Please review.
Project coverage is 55.68%. Comparing base (
5429925) to head (6994eff).
Additional details and impacted files
@@ Coverage Diff @@
## master #5496 +/- ##
==========================================
+ Coverage 55.65% 55.68% +0.02%
==========================================
Files 1041 1042 +1
Lines 101935 101988 +53
==========================================
+ Hits 56729 56788 +59
+ Misses 40846 40843 -3
+ Partials 4360 4357 -3
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 55.68% <75.00%> (+0.02%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Maybe add memory usage to the benchmarks posted here: B/op and allocs/op columns.
Initially I https://github.com/onflow/flow-go/pull/4495, however, the benchmark shows no difference, the data is still uncompressed.
Probably it was still in vlog ( not compacted ) badger is compressing when compacting. (in background) But this approach is much better, other one is really hard to benchmark.
Cose for now, since I ran into an issue implementing the migration of all the badger values to compressed format. Will reopen when https://github.com/onflow/flow-go/pull/5627 is open and merged.