flow-go [Database] Compress values stored in badgerDB

Close https://github.com/onflow/flow-go/issues/5402

Summary

This PR uses snappy to compress the data stored in badgerDB. Benchmark shows a significant save on storage especially for chunk data pack storage, which takes more than 90% of disk usage.

First attempt

Initially I tried the Snappy options from badger, however, the benchmark shows no difference, the data is still uncompressed.

Second attempt

In order to compress the data, I need to manually compress the value stored in badger. I chose the Snappy algorithm, as it doesn't sacrifices the speed too much and still have a good compression rate.

Benchmark Storage

I ran the localnet instance with benchmark tool sending transactions for 5 mins. The result shows the protocol data is reduced 10%, and chunk data pack is reduced 65%:

# Without compression
execution protocol data: 4.4MB
execution chunk data pack: 124MB

# With compression
execution protocol data: 4MB
execution chunk data pack data: 43MB

Benchmark Speed of pure encoding and decoding

Pure encoding is 28% slower, Pure decoding is 3.6 times slower:

# encoding without compression
BenchmarkEncodeWithoutCompress-10         715384              1671 ns/op            2297 B/op         17 allocs/op
# encoding with compression
BenchmarkEncodeAndCompress-10             520813              2281 ns/op            3193 B/op         18 allocs/op

# decoding without compression
BenchmarkDecodeWithoutUncompress-10      6552171               168.9 ns/op           224 B/op          3 allocs/op
# decoding with compression
BenchmarkDecodeUncompressed-10           1981490               598.3 ns/op          1123 B/op          7 allocs/op

Benchmark Speed of round trip saving and reading from database

It shows 3% slowness in writes, and 5% slowness in reads (but also a 55% reduction in memory usage in reads)

# encoding without compression and saving to database
BenchmarkReadResult-10            479792              2156 ns/op            2212 B/op          7 allocs/op
# encoding with compression and saving to database
BenchmarkReadResult-10            499564              2230 ns/op             985 B/op          6 allocs/op

# reading from database and decoding without compression
BenchmarkSaveResult-10             22549             51149 ns/op            6604 B/op         80 allocs/op
# reading from database and decoding with compression
BenchmarkSaveResult-10             23582             48175 ns/op            8501 B/op         82 allocs/op

Mar 01 '24 21:03 zhangchiqing

Codecov Report

Attention: Patch coverage is 75.00000% with 18 lines in your changes are missing coverage. Please review.

Project coverage is 55.68%. Comparing base (5429925) to head (6994eff).

Files	Patch %	Lines
storage/badger/operation/init.go	38.88%	10 Missing and 1 partial :warning:
storage/badger/operation/codec.go	90.24%	4 Missing :warning:
storage/badger/operation/common.go	76.92%	3 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5496      +/-   ##
==========================================
+ Coverage   55.65%   55.68%   +0.02%     
==========================================
  Files        1041     1042       +1     
  Lines      101935   101988      +53     
==========================================
+ Hits        56729    56788      +59     
+ Misses      40846    40843       -3     
+ Partials     4360     4357       -3

Flag	Coverage Δ
unittests	`55.68% <75.00%> (+0.02%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Mar 01 '24 21:03 codecov-commenter

Maybe add memory usage to the benchmarks posted here: B/op and allocs/op columns.

Mar 04 '24 18:03 fxamacker

Initially I https://github.com/onflow/flow-go/pull/4495, however, the benchmark shows no difference, the data is still uncompressed.

Probably it was still in vlog ( not compacted ) badger is compressing when compacting. (in background) But this approach is much better, other one is really hard to benchmark.

Mar 04 '24 18:03 bluesign

Cose for now, since I ran into an issue implementing the migration of all the badger values to compressed format. Will reopen when https://github.com/onflow/flow-go/pull/5627 is open and merged.

Apr 10 '24 16:04 zhangchiqing

flow-go flow-go copied to clipboard

[Database] Compress values stored in badgerDB

Summary

First attempt

Second attempt

Benchmark Storage

Benchmark Speed of pure encoding and decoding

Benchmark Speed of round trip saving and reading from database

Codecov Report

flow-go
flow-go copied to clipboard