cronos icon indicating copy to clipboard operation
cronos copied to clipboard

Problem: db size increase too fast

Open yihuang opened this issue 2 years ago • 64 comments

investigate to see if there are low hanging fruits to reduce the db size.

yihuang avatar May 03 '22 09:05 yihuang

For reference:

939G	application.db
42G	blockstore.db
1.0G	cs.wal
46M	evidence.db
4.0K	priv_validator_state.json
47M	snapshots
81G	state.db
238G	tx_index.db

yihuang avatar May 03 '22 09:05 yihuang

Remove tx_index.db

Currently we rely on tx indexer to query tx by eth tx hash, an alternative solution is to store that index in a standalone kv db in app side, so we don't need to retain all the tx indexes.

yihuang avatar May 03 '22 09:05 yihuang

RocksDB uses snippy as the default compression algorithm, We can use LZ4 or other more aggressive(but may more resource consuming) algorithms as its compression option. ref: https://github.com/facebook/rocksdb/wiki/Compression https://github.com/tendermint/tm-db/blob/d24d5c7ee87a2e5da2678407dea3eee554277c83/rocksdb.go#L33

JayT106 avatar May 03 '22 19:05 JayT106

Remove tx_index.db

Currently we rely on tx indexer to query tx by eth tx hash, an alternative solution is to store that index in a standalone kv db in app side, so we don't need to retain all the tx indexes.

yap, we should consider using a new kvstore just for storing the tx hash mapping. Also we can disable the Tendermint indexer for increasing the consensus performance.

JayT106 avatar May 03 '22 20:05 JayT106

Remove tx_index.db

Currently we rely on tx indexer to query tx by eth tx hash, an alternative solution is to store that index in a standalone kv db in app side, so we don't need to retain all the tx indexes.

you mean nodes could choose not to have this tx_index.db by moving this part off-chain?

adu-crypto avatar May 04 '22 10:05 adu-crypto

Remove tx_index.db

Currently we rely on tx indexer to query tx by eth tx hash, an alternative solution is to store that index in a standalone kv db in app side, so we don't need to retain all the tx indexes.

you mean nodes could choose not to have this tx_index.db by moving this part off-chain?

yes, by store the eth tx hash index in another place.

yihuang avatar May 04 '22 10:05 yihuang

I will start a testing build with the custom RocksDB setup to see how good it can be improved

JayT106 avatar May 04 '22 21:05 JayT106

# IndexEvents defines the set of events in the form {eventType}.{attributeKey},
# which informs Tendermint what to index. If empty, all events will be indexed.
#
# Example:
# ["message.sender", "message.recipient"]
index-events = []

There's an option in app.toml to fine turn what events to index.

The minimal one for json-rpc to work should be:

index-events = ["ethereum_tx.ethereumTxHash", "ethereum_tx.txIndex"]

EDIT: ethereum_tx.txIndex is necessary too.

yihuang avatar May 05 '22 03:05 yihuang

For reference:

939G	application.db
42G	blockstore.db
1.0G	cs.wal
46M	evidence.db
4.0K	priv_validator_state.json
47M	snapshots
81G	state.db
238G	tx_index.db

Which block height was observed in this DB scale?

JayT106 avatar May 06 '22 14:05 JayT106

Looks like lz4 might be working, the currentapplication.db of the testing node at the block height of 1730K is around 511G. Projecting today's block height(2692K) will be around 755G. And at the same time, the application.db of the full node with rocksDB using snippy is 1057G. roughly 25% space-saving.

Wait until the testing node fully syncs up to the network and see the final result.

JayT106 avatar May 09 '22 21:05 JayT106

@tomtau mentioned we could do some statistics on the applications.db to see what kinds of data occupy most space, then see if there's any waste can be saved in the corresponding modules. For example, iterate the iavl tree, and sum the value lengths of each module prefixes.

yihuang avatar May 17 '22 09:05 yihuang

BTW, this is prunning=default node size (thanks @allthatjazzleo):

535G	/chain/.cronosd/data/application.db
20K	/chain/.cronosd/data/snapshots
44G	/chain/.cronosd/data/blockstore.db
120G	/chain/.cronosd/data/state.db
312G	/chain/.cronosd/data/tx_index.db
20K	/chain/.cronosd/data/evidence.db
1023M	/chain/.cronosd/data/cs.wal
1011G	/chain/.cronosd/data/

Compared to full archive one:

1.1T	/chain/.cronosd/data/application.db
79M	/chain/.cronosd/data/snapshots
47G	/chain/.cronosd/data/blockstore.db
90G	/chain/.cronosd/data/state.db
260G	/chain/.cronosd/data/tx_index.db
78M	/chain/.cronosd/data/evidence.db
1.1G	/chain/.cronosd/data/cs.wal
1.5T	/chain/.cronosd/data/

yihuang avatar May 17 '22 10:05 yihuang

BTW, this is prunning=default node size (thanks @allthatjazzleo):

535G	/chain/.cronosd/data/application.db
20K	/chain/.cronosd/data/snapshots
44G	/chain/.cronosd/data/blockstore.db
120G	/chain/.cronosd/data/state.db
312G	/chain/.cronosd/data/tx_index.db
20K	/chain/.cronosd/data/evidence.db
1023M	/chain/.cronosd/data/cs.wal
1011G	/chain/.cronosd/data/

Compared to full archive one:

1.1T	/chain/.cronosd/data/application.db
79M	/chain/.cronosd/data/snapshots
47G	/chain/.cronosd/data/blockstore.db
90G	/chain/.cronosd/data/state.db
260G	/chain/.cronosd/data/tx_index.db
78M	/chain/.cronosd/data/evidence.db
1.1G	/chain/.cronosd/data/cs.wal
1.5T	/chain/.cronosd/data/

the pruning=default only keeps last 100 states, so it will be good for running the node without query functions.

JayT106 avatar May 17 '22 14:05 JayT106

Got the testing node synced up to the plan upgrade height, using default:

1057776M	./application.db
45714M	./blockstore.db
88630M	./state.db

using lz4

1058545M	./application.db
47363M	./blockstore.db
88633M	./state.db

It meets the benchmark in this article. There is no gain from the compression ratio, only gains from the compression/decompression speed
https://morotti.github.io/lzbench-web/?dataset=canterbury/alice29.txt&machine=desktop

JayT106 avatar May 17 '22 14:05 JayT106

why is state.db larger in the pruned one? (120GB vs 90GB)

tomtau avatar May 18 '22 01:05 tomtau

Went through the application.db, Got some basic statistic numbers (at height 2933002 and the size is raw data length): evm and ibc module use major store space in the database which is not surprised, will look at more details in these modules

evm ~24.6M kv pairs, keySizeTotal: ~1.3G, valueSizeTotal: ~976M, avg key size:52, avg value size:39 ibc ~2.6M kv pairs, keySizeTotal: ~149M, valueSizeTotal: ~58M, avg key size:57, avg value size:22

JayT106 avatar May 26 '22 14:05 JayT106

Another thing related is, in v0.6.x we had a minor issue that contract suicide don't really delete the code and storage, not sure how much impact does that have on the db size though.

yihuang avatar May 26 '22 16:05 yihuang

it feels that ibc shouldn't store so many pairs, can you see the prefixes?

yihuang avatar May 26 '22 16:05 yihuang

the major Key patterns in ibc store:

acks/ports/transfer/channels/channel-0/sequences/... counts 1003777 receipts/ports/transfer/channels/channel-0/sequences/... counts 1003777 clients/07-tendermint-1/consensusStates/... counts 403893 636C69656E74732F30372D74656E6465726D696E742D31... (hex code of clients/07-tendermint-1) counts 134631

JayT106 avatar May 26 '22 18:05 JayT106

I guess some historical (i.e. older than "evidence age") states, acks, receipts... could be pruned from ibc application storage? Do you have a more detailed breakdown of evm?

tomtau avatar May 27 '22 01:05 tomtau

https://github.com/cosmos/ibc-go/blob/release/v2.2.x/modules/light-clients/07-tendermint/types/update.go#L137 for the consensusStates, there's a pruning logic, but it only deletes at most one item at a time. we might need to check how many expired ones are currently.

the sequence keys don't seem prune at all.

yihuang avatar May 27 '22 02:05 yihuang

Do you have a more detailed breakdown of evm?

working on it, The evmstore stores: 1: code, the key will be the prefix 01 + codehash (this part should be fine) 2: storage, the key will be the prefix 02 + eth account address + hash of something (trying to figure out)

JayT106 avatar May 27 '22 02:05 JayT106

The EVM module's storage schema is much simpler, contract code and storage, and the storage slots are calculated by evm internally, I guess there's not much to prune there.

yihuang avatar May 27 '22 02:05 yihuang

2: storage, the key will be the prefix 02 + eth account address + hash of something (trying to figure out)

it's the storage slot number, computed by evm internal.

yihuang avatar May 27 '22 02:05 yihuang

2: storage, the key will be the prefix 02 + eth account address + hash of something (trying to figure out)

it's the storage slot number, computed by evm internal.

in the storage part, the address 1359135B1C9EB7393F75271E9A2B72FC0D055B2E has 382381 kv pairs, so does it store that much slots? https://cronos.org/explorer/address/0x1359135B1C9Eb7393f75271E9a2b72fc0d055B2E/transactions

JayT106 avatar May 27 '22 02:05 JayT106

2: storage, the key will be the prefix 02 + eth account address + hash of something (trying to figure out)

it's the storage slot number, computed by evm internal.

in the storage part, the address 1359135B1C9EB7393F75271E9A2B72FC0D055B2E has 382381 kv pairs, so does it store that much slots? https://cronos.org/explorer/address/0x1359135B1C9Eb7393f75271E9a2b72fc0d055B2E/transactions

to verify that, we need to have the source code, the solidity compiler can output a storage layout file which is helpful to verify the slots.

yihuang avatar May 27 '22 02:05 yihuang

I just had an idea to trade some speed for disk space: currently, the storage format is like this:

02 + address{20} + slot1{32} -> value1{32}
02 + address{20} + slot2{32} -> value2{32}
...

Alternatively:

02 + address{20} + slotHighBits{20} -> {slotLowBits{12} -> value, ...}

It groups at most 4096 values into one KV pair, I guess it helps to reduce redundancy in the keys and intermediate overhead in the iavl tree.

It works best for the continuous storage regions in solidity contract, not so well for maps.

yihuang avatar May 27 '22 02:05 yihuang

go-ethereum stores each contract's state in an independent trie, so their structure is like this:

accounts-trie:
  contract address -> rootHash

rootHash:
  slot1 -> value1
  slot2 -> value2

Much less redundancy there.

yihuang avatar May 27 '22 03:05 yihuang

Another thing related is, in v0.6.x we had a minor issue that contract suicide don't really delete the code and storage, not sure how much impact does that have on the db size though.

Does our indexer know how many (and which) contracts had been suicided? from the datastore cannot see which contract had been suicided

JayT106 avatar May 27 '22 19:05 JayT106

https://github.com/cosmos/ibc-go/blob/release/v2.2.x/modules/light-clients/07-tendermint/types/update.go#L137 for the consensusStates, there's a pruning logic, but it only deletes at most one item at a time. we might need to check how many expired ones are currently.

the sequence keys don't seem prune at all.

the active consensusStates is 102 (block height around 3M) and seems like about the same amount at the height 2M. Maybe we can prune a lot of kv pairs?

yes, the latest sequence # in the test data is 1038570 (and I got the counts 1003777, not sure why there are some sequence# missing), Can we prune these values if we don't need those data anymore? each key-value size is around 80bytes

JayT106 avatar May 30 '22 19:05 JayT106