foundry icon indicating copy to clipboard operation
foundry copied to clipboard

block.state.commit is slow

Open majecty opened this issue 6 years ago • 3 comments

I found that block.state.commit function is slow. In Corgi network, when a block has a transaction that splits one UTXO to 100 UTXOs, the block.state.commit function takes 7 seconds when called by a proposer, takes 5 seconds when is called by other validators.

I added logs to find the function. You can see the logs in this link

Here are the sample logs.

// in proposer
#105317671 2019-07-25 10:51:27 UTC Client Worker CodeChain-io/codechain#2 TRACE miner Adding transaction 0x7a2813b09fc75f31e94f10af6f8e80d4fdf9e7ab3a856852a5c76a75b5b62018 took 13.64375ms
#105317671 2019-07-25 10:51:27 UTC Client Worker CodeChain-io/codechain#2 DEBUG miner Pushed 1/1 transactions
#105317671 2019-07-25 10:51:27 UTC Client Worker CodeChain-io/codechain#2 DEBUG miner Get parent hash
#105317671 2019-07-25 10:51:27 UTC Client Worker CodeChain-io/codechain#2 DEBUG miner Get parent header
#105317671 2019-07-25 10:51:27 UTC Client Worker CodeChain-io/codechain#2 DEBUG miner Get parent view
#105317671 2019-07-25 10:51:27 UTC Client Worker CodeChain-io/codechain#2 DEBUG miner Get transactions root
#105317671 2019-07-25 10:51:27 UTC Client Worker CodeChain-io/codechain#2 DEBUG miner Clone block state
#105317671 2019-07-25 10:51:27 UTC Client Worker CodeChain-io/codechain#2 DEBUG miner Call on_close_block
#105317671 2019-07-25 10:51:27 UTC Client Worker CodeChain-io/codechain#2 TRACE state add_balance(0x6850…65e1, 50): 2639520825
#105317671 2019-07-25 10:51:27 UTC Client Worker CodeChain-io/codechain#2 TRACE state add_balance(0x6202…8750, 19): 1300650642
#105317671 2019-07-25 10:51:27 UTC Client Worker CodeChain-io/codechain#2 TRACE state add_balance(0x263b…fcb1, 30): 1762899421
#105317671 2019-07-25 10:51:27 UTC Client Worker CodeChain-io/codechain#2 TRACE state add_balance(0x0968…b3f6, 1): 919023814
#105317671 2019-07-25 10:51:27 UTC Client Worker CodeChain-io/codechain#2 DEBUG miner commit state

#105317671 2019-07-25 10:51:34 UTC Client Worker CodeChain-io/codechain#2 DEBUG miner Set transactions root
#105317671 2019-07-25 10:51:34 UTC Client Worker CodeChain-io/codechain#2 DEBUG miner set state root
#105317671 2019-07-25 10:51:34 UTC event.loop INFO rpc RPC call(slog([]))
#105317671 2019-07-25 10:51:34 UTC Client Worker CodeChain-io/codechain#2 DEBUG miner Close block
#105317671 2019-07-25 10:51:34 UTC Client Worker CodeChain-io/codechain#2 DEBUG miner Before get write access of mem_pool
#105317671 2019-07-25 10:51:34 UTC Client Worker CodeChain-io/codechain#2 DEBUG miner After get write access of mem_pool
#105317671 2019-07-25 10:51:34 UTC Client Worker CodeChain-io/codechain#2 TRACE mem_pool remove() called, time: 2126657, timestamp: 1564051884
#105317671 2019-07-25 10:51:34 UTC Client Worker CodeChain-io/codechain#2 TRACE miner update_sealing: engine indicates internal sealing
#105317671 2019-07-25 10:51:34 UTC Client Worker CodeChain-io/codechain#2 TRACE miner seal_block_internally: attempting internal seal.
#105317671 2019-07-25 10:51:34 UTC tendermint TRACE engine Proposer nonce: 26
// in another validator
#608329186 2019-07-25 11:17:25 UTC Blocks Verifier CodeChain-io/codechain#1 DEBUG miner Before lock block 2126978

#608329186 2019-07-25 11:17:30 UTC Blocks Verifier CodeChain-io/codechain#1 DEBUG miner After lock block 2126978

majecty avatar Aug 14 '19 01:08 majecty

I found out that the root cause of this problem is the speed of ArchiveDB::get function. It happens while reading the trie node of the state db. The abstract call sequence is as follows: TopLevelState::commit -> ShardCache::comit -> asset.commit -> WriteBack::commit -> TrieDBMut::insert -> TrieDBMut::insert_aux -> ArchiveDB::get See these lines for the reference: https://github.com/CodeChain-io/codechain/blob/48a3b880f96a345d47acc6a48902fb43a06702ef/util/merkle/src/triedbmut.rs#L73 https://github.com/CodeChain-io/codechain/blob/48a3b880f96a345d47acc6a48902fb43a06702ef/util/journaldb/src/archivedb.rs#L81

It sometimes gets really slow (about 1s) when the cache miss happens, but it isn't always the case.

remagpie avatar Aug 26 '19 06:08 remagpie

I investigated futher down the implementation, and found out that this line takes long time for this situation: https://github.com/CodeChain-io/codechain/blob/48a3b880f96a345d47acc6a48902fb43a06702ef/util/kvdb-rocksdb/src/lib.rs#L570

I think the rocksdb's performance is the issue here.

remagpie avatar Aug 26 '19 07:08 remagpie

The problem was gone when I switched the underlying disk from HDD to SSD. It seems this problem only affects the machines with rotating disks.

remagpie avatar Aug 26 '19 08:08 remagpie