fuel-core icon indicating copy to clipboard operation
fuel-core copied to clipboard

Determine file encoding for initial state

Open segfault-magnet opened this issue 1 year ago • 2 comments

Current development on the regenesis feature has split off the initial state from the chain config. This buys us some flexibility to select an appropriate encoding for the initial state file.

Some points to consider:

  • Storage -- how much overhead does the encoding have?
  • Cursor -- how quickly can we continue decoding the file when resuming the regenesis?
  • Performance -- both for decoding and encoding
  • Compression -- encoding overhead can be reduced by compression. On the other hand, having a cursor to the last loaded element might get complicated. If compression is used ideally it shouldn't have system dependencies and be a well-known algorithm.
  • Human readability -- keeping it human readable helps with manipulating the file, and developing tools to analyze it, testing.
  • Library maturity -- general support for encoding/decoding and also rust-specific support.
  • Streaming - can you decode/encode on the fly? What is the API like? Do you handle reading/saving the data?

segfault-magnet avatar Oct 04 '23 12:10 segfault-magnet

An update:

Wrote a small utility to benchmark our use case.

Our expected data volume is enormous -- a 1TB rocksdb database at the moment of regenesis. Benchmarking for this much data became problematic so I resorted to a mixed approach of benchmarks and linear regression.

Testing was done in memory to remove the storage speed variable.

Some expectations were given for the contract state in that it might reach numbers of 100M entries, which meant that one ContractConfig might take up 6.4GB of memory (or more depending on the decoding overhead) when loaded (64B per entry).

This made us uneasy with regards to keeping the state and balances collections inside ContractConfig.

We agreed that ContractConfig would need some flattening. We considered encoding the state and balances into separate files. We ultimately decided against it because of the overhead when matching the contract state and balances to the contract itself. This is manifested in the need for a 'foreign key' to be saved with each storage or balance entry. If we don't enforce an ordering while encoding then a columnar file layout would be needed so that we may locate all of the contract state and balances when parsing a single contract. We briefly considered using the rocksdb snapshot (or sled or mysql) as the target format itself but didn't pursue the idea further.

An alternative would be introducing a somewhat custom layout to fit our needs exactly. We're currently placing the following requirement on whatever data format is chosen:

  • First all CoinConfigs are encoded,
  • Then all MessageConfigs are encoded,
  • Finally all ContractConfigs are encoded. After each ContractConfig all of its state and balances should follow in any order before another ContractConfig is decoded.

The ContractConfig was broken down into 3 structures:

  1. A ContractState representing one storage slot (a (Bytes32, Bytes32))
  2. A ContractBalance representing the balance of the contract for a particular AssetId (a (Bytes32, u64))
  3. ContractConfig - the original fields minus the above.

An example:

COIN
... (coins)
COIN
MESSAGE
...(messages)
MESSAGE
CONTRACT#1
STATE
STATE
BALANCE
STATE
BALANCE
CONTRACT#2
...
...

Test data

Randomly generated Coins, Messages, and Contracts in the above-described ordering. e.g. 1k test entries = 333 coins, 333 messages, and 333 contracts (each with 10k state entries and 100 balance entries)

Other formats

Didn't consider any column-oriented formats (such as Parquet) since we don't benefit from it. Didn't consider any formats that need schema code generation (such as ProtoBuf, CaptnProto, etc) Didn't consider formats whose rust impl is poorly maintained

Compression

Each test is run with and without compression.

Used native Zstd with minimal compression

Json (serde_json)

Pros:

  • Mature, popular
  • Serde compatible
  • Human readable
  • Can be streamed (json lines) Cons:
  • We don't need the schema, adds encoding overhead

Measurements taken for 10k, 20k, ..., 100k entries. Using linear regression to predict usage for up to 1B entries:

Storage

storage_requirements

400GB without compression, 255GB with compression.

Encoding performance

encoding_time

Around 15m for uncompressed json, 1h8m for compressed.

Decoding performance

decoding_time

30 minutes for uncompressed, 43 minutes for compressed json,

Bincode

Pros:

  • Mature
  • Serde compatible
  • No schema
  • Can be streamed (just encode elements one after another) Cons:
  • Not human-readable
  • Manual streaming -- Collections require size to be encoded before hand

Measurements taken for 10k, 20k, ..., 100k entries. Using linear regression to predict usage for up to 1B entries:

Storage

storage_requirements Around 290GB uncompressed, 240GB compressed

Encoding performance

encoding_time Around 13 minutes uncompressed, and around 46 minutes compressed.

Decoding performance

decoding_time 27 minutes uncompressed, around 37 compressed

Bson

Pros:

  • Mature
  • Serde compatible
  • Can be streamed Cons:
  • Has overhead for schema
  • Interface doesn't allow for encoding into a writer -- extra allocations
  • No unsigned types native support, have to be careful around u64 and bigger types since they can't fit into i64 supported by BSON.
  • Manual streaming -- collections can only be encoded in-memory.

Measurements taken for 10k, 20k, ..., 100k entries. Using linear regression to predict usage for up to 1B entries:

Storage

storage_requirements Around 400GB uncompressed, around 250GB compressed

Encoding performance

encoding_time Around 15m uncompressed, around 45m compressed

Decoding performance

decoding_time

33 minutes uncompressed, 46 minutes compressed.

Bench summary graph

storage_requirements encoding_time decoding_time

Compression impact on cursor

Seeking a location on uncompressed data is fast. This is not the case for compressed data since you have to decompress even the data you don't care about. There are workarounds should we need them.

The naive approach takes around 13 minutes to decompress and seek to the end of 400GB of compressed data.

Current summary: We're going with bincode + optional compression. Starting work on implementing the reading/writing logic.

segfault-magnet avatar Oct 10 '23 14:10 segfault-magnet

Tried one more format, Parquet. Even though we don't benefit from the columnar layout parquet has the advantage of encoding data in chunks (solving the cursor + encoding problem).

It is also not Serde compatible and has poor support for deriving the encoding/decoding code.

Also the column for a file should represent one entity (and not multiple like a rust enum might do). So that means 5 files: coins, messages, contracts, contract_state and contract_balance.

Measurements taken for 10k, 20k, ..., 100k entries. Using linear regression to predict usage for up to 1B entries:

Storage

storage_requirements Around 140 GB without compression, more (+5GB) with compression (at the lowest level Gz, just like all the other compressions used)

Encoding performance

encoding_time Around 300s (uncompressed) and around 1250s compressed.

Decoding performance

decoding_time 480s (uncompressed) and around 720s (compressed).

All compared

Storage

storage_requirements

Encoding performance

encoding_time

Decoding performance

decoding_time

It seems Parquet is a clear winner. Will proceed to use it instead of bincode for the regenesis.

segfault-magnet avatar Nov 07 '23 13:11 segfault-magnet

Done during https://github.com/FuelLabs/fuel-core/pull/1474

xgreenx avatar Jan 31 '24 13:01 xgreenx