cronos icon indicating copy to clipboard operation
cronos copied to clipboard

Problem: memiavl snapshot don't have any compression

Open yihuang opened this issue 2 years ago • 4 comments

Currently for simplicity, the snapshot format is plain data without any compression, compression is important to reduce the size.

  • nodes, each node is just a bunch of integers together with 32bytes hash, there are lots of zero bytes to compress, we can compress each node independently and add 1 byte length prefix, nodes are referenced by file offset, candidates:
    • ~~RLE~~
    • ~~Cap'n Proto packing schema~~
    • simply concatenated varint, although not strictly random accessing, but skipping varint seems fast enough:
      def skipVarInt(buf, n):
        'skip n varint encoded integers in buffer'
        for offset, b in enumerate(buf):
          if b>>7 == 0:
            n -= 1
            if n == 0:
              break
        else:
          raise Exception('buffer exhausted')
        return offset
      
    • stream vbyte
  • keys, a bunch of short and ordered bytes, frequent access, delta encoding should be efficient here, then we need to organize the data in small fixed size chunk, and support looking up the key by index, rather than uncompressed file offset.
  • values, unordered, less frequent access, can apply some generic random accessible compression like zstd seektable format, still support look up by uncompressed file offset.
    • In IAVL modify operations, the values in snapshot are not used at all, if the query is taken over by versiondb, then the values are rarely needed (maybe need in proof generation?), we can even look up the value field from versiondb using (node.key, node.version), if versiondb is integrated with IAVL tree closely.

yihuang avatar Jan 28 '23 02:01 yihuang

deprioritize this one to avoid premature optimization, uncompressed nodes has fixed size, had the advantage of simplicity.

yihuang avatar Feb 13 '23 10:02 yihuang

now the simpler design is about to release, more sophisticated version can be considered now (if we really want to dig this rabbit hole) ;D

yihuang avatar Jun 26 '23 04:06 yihuang

it seems filesystem level compression works well with memiavl, we probably don't need to worry about this issue at all.

$ sudo compsize -x /chain/.chain-maind/data/memiavl.db
Processed 865 files, 320027 regular extents (320027 refs), 416 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       41%       23G          56G          56G
none       100%       17G          17G          17G
zstd        14%      5.8G          38G          38G

yihuang avatar Jul 11 '23 09:07 yihuang

Do Not Worry About.

On Tue, 11 July 2023, 7:22 pm yihuang, @.***> wrote:

it seems filesystem level compression works with mmap, if it works well, we probably don't need to worry about this issue at all.

— Reply to this email directly, view it on GitHub https://github.com/crypto-org-chain/cronos/issues/827#issuecomment-1630468974, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWNKBY4UMXQ4LELYBWMVZDDXPULNDANCNFSM6AAAAAAUJLUC6M . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Bunsen1990 avatar Jul 13 '23 11:07 Bunsen1990