flow-go icon indicating copy to clipboard operation
flow-go copied to clipboard

[Database] Compress values stored in badgerDB

Open zhangchiqing opened this issue 1 year ago • 3 comments

Close https://github.com/onflow/flow-go/issues/5402

Summary

This PR uses snappy to compress the data stored in badgerDB. Benchmark shows a significant save on storage especially for chunk data pack storage, which takes more than 90% of disk usage.

First attempt

Initially I tried the Snappy options from badger, however, the benchmark shows no difference, the data is still uncompressed.

Second attempt

In order to compress the data, I need to manually compress the value stored in badger. I chose the Snappy algorithm, as it doesn't sacrifices the speed too much and still have a good compression rate.

Benchmark Storage

I ran the localnet instance with benchmark tool sending transactions for 5 mins. The result shows the protocol data is reduced 10%, and chunk data pack is reduced 65%:

# Without compression
execution protocol data: 4.4MB
execution chunk data pack: 124MB

# With compression
execution protocol data: 4MB
execution chunk data pack data: 43MB

Benchmark Speed of pure encoding and decoding

Pure encoding is 28% slower, Pure decoding is 3.6 times slower:

# encoding without compression
BenchmarkEncodeWithoutCompress-10         715384              1671 ns/op            2297 B/op         17 allocs/op
# encoding with compression
BenchmarkEncodeAndCompress-10             520813              2281 ns/op            3193 B/op         18 allocs/op

# decoding without compression
BenchmarkDecodeWithoutUncompress-10      6552171               168.9 ns/op           224 B/op          3 allocs/op
# decoding with compression
BenchmarkDecodeUncompressed-10           1981490               598.3 ns/op          1123 B/op          7 allocs/op

Benchmark Speed of round trip saving and reading from database

It shows 3% slowness in writes, and 5% slowness in reads (but also a 55% reduction in memory usage in reads)

# encoding without compression and saving to database
BenchmarkReadResult-10            479792              2156 ns/op            2212 B/op          7 allocs/op
# encoding with compression and saving to database
BenchmarkReadResult-10            499564              2230 ns/op             985 B/op          6 allocs/op

# reading from database and decoding without compression
BenchmarkSaveResult-10             22549             51149 ns/op            6604 B/op         80 allocs/op
# reading from database and decoding with compression
BenchmarkSaveResult-10             23582             48175 ns/op            8501 B/op         82 allocs/op

zhangchiqing avatar Mar 01 '24 21:03 zhangchiqing

Codecov Report

Attention: Patch coverage is 75.00000% with 18 lines in your changes are missing coverage. Please review.

Project coverage is 55.68%. Comparing base (5429925) to head (6994eff).

Files Patch % Lines
storage/badger/operation/init.go 38.88% 10 Missing and 1 partial :warning:
storage/badger/operation/codec.go 90.24% 4 Missing :warning:
storage/badger/operation/common.go 76.92% 3 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5496      +/-   ##
==========================================
+ Coverage   55.65%   55.68%   +0.02%     
==========================================
  Files        1041     1042       +1     
  Lines      101935   101988      +53     
==========================================
+ Hits        56729    56788      +59     
+ Misses      40846    40843       -3     
+ Partials     4360     4357       -3     
Flag Coverage Δ
unittests 55.68% <75.00%> (+0.02%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Mar 01 '24 21:03 codecov-commenter

Maybe add memory usage to the benchmarks posted here: B/op and allocs/op columns.

fxamacker avatar Mar 04 '24 18:03 fxamacker

Initially I https://github.com/onflow/flow-go/pull/4495, however, the benchmark shows no difference, the data is still uncompressed.

Probably it was still in vlog ( not compacted ) badger is compressing when compacting. (in background) But this approach is much better, other one is really hard to benchmark.

bluesign avatar Mar 04 '24 18:03 bluesign

Cose for now, since I ran into an issue implementing the migration of all the badger values to compressed format. Will reopen when https://github.com/onflow/flow-go/pull/5627 is open and merged.

zhangchiqing avatar Apr 10 '24 16:04 zhangchiqing