cosmos-sdk icon indicating copy to clipboard operation
cosmos-sdk copied to clipboard

x/crisis: asserting crisis invariants distribution/can-withdraw takes a very long time

Open odeke-em opened this issue 4 years ago • 12 comments

From my profiling and sleuthing, I just noticed that starting a node using gaiad took more than 30 minutes and majority of the time was spent on module=x/crisis name=distribution/can-withdraw per

$ gaiad start --p2p.seeds "[email protected]:26657"
12:34AM INF starting ABCI with Tendermint
12:34AM INF Starting multiAppConn service impl=multiAppConn module=proxy
12:34AM INF Starting localClient service connection=query impl=localClient module=abci-client
12:34AM INF Starting localClient service connection=snapshot impl=localClient module=abci-client
12:34AM INF Starting localClient service connection=mempool impl=localClient module=abci-client
12:34AM INF Starting localClient service connection=consensus impl=localClient module=abci-client
12:34AM INF Starting EventBus service impl=EventBus module=events
12:34AM INF Starting PubSub service impl=PubSub module=pubsub
12:34AM INF Starting IndexerService service impl=IndexerService module=txindex
12:34AM INF ABCI Handshake App Info hash= height=0 module=consensus protocol-version=0 software-version=
12:34AM INF ABCI Replay Blocks appHeight=0 module=consensus stateHeight=0 storeHeight=0
12:34AM INF asserting crisis invariants inv=0/12 module=x/crisis name=bank/nonnegative-outstanding
12:34AM INF asserting crisis invariants inv=1/12 module=x/crisis name=bank/total-supply
12:34AM INF asserting crisis invariants inv=2/12 module=x/crisis name=distribution/nonnegative-outstanding
12:34AM INF asserting crisis invariants inv=3/12 module=x/crisis name=distribution/can-withdraw
1:07AM INF asserting crisis invariants inv=4/12 module=x/crisis name=distribution/reference-count
1:07AM INF asserting crisis invariants inv=5/12 module=x/crisis name=distribution/module-account
1:07AM INF asserting crisis invariants inv=6/12 module=x/crisis name=liquidity/escrow-amount
1:07AM INF asserting crisis invariants inv=7/12 module=x/crisis name=staking/module-accounts
1:07AM INF asserting crisis invariants inv=8/12 module=x/crisis name=staking/nonnegative-power
1:07AM INF asserting crisis invariants inv=9/12 module=x/crisis name=staking/positive-delegation
1:07AM INF asserting crisis invariants inv=10/12 module=x/crisis name=staking/delegator-shares
1:08AM INF asserting crisis invariants inv=11/12 module=x/crisis name=gov/module-account
1:08AM INF asserted all invariants duration=2065579.553016 height=5200791 module=x/crisis
1:08AM INF created new capability module=ibc name=ports/transfer
1:08AM INF port binded module=x/ibc/port port=transfer
1:08AM INF claimed capability capability=1 module=transfer name=ports/transfer

In late February/March 2021 we had brought down start time to a very short period, but this is a huge increase. We need to investigate what's going wrong.

I've examined live traffic for the past 1.5 hours and a pain point seems to be the expected Screen Shot 2021-08-08 at 1 35 11 AM

I know there has been talk of replacing the code in iavl, which is why my team and I held from from optimizing it, but it's been 6+ months and perhaps it is time for us to dive in deep and figure out what could be improved.

/cc @cuonglm @kirbyquerby


For Admin Use

  • [x] Not duplicate issue
  • [ ] Appropriate labels applied
  • [ ] Appropriate contributors tagged
  • [ ] Contributor assigned/self-assigned

odeke-em avatar Aug 08 '21 08:08 odeke-em

@odeke-em which gaia version did you use?

cuonglm avatar Aug 09 '21 02:08 cuonglm

Ooooo yes, this is a very well known issue to the point we encourage operators to start with --skip-crisis-invariants. But that really defeats the purpose of having x/crisis as a module/function, so we should really strive to improve the performance of this. But I imagine not only the problem but also the improvements will be multi-faceted.

Also, it is true we're moving to SMT in place of IAVL, at that work is very active. I don't know estimated completion times (cc @aaronc), so based on the provided benchmark it seems to be completely IAVL related. Not sure if it's worth digging into this...?

alexanderbez avatar Aug 09 '21 13:08 alexanderbez

It will take few more months to have ADR-40 implemented and merged. IAVL and current store package won't be deprecated immediately and we will support both SMT and IAVL in the SDK for few more releases.

robert-zaremba avatar Aug 13 '21 14:08 robert-zaremba

Awesome, thank you @alexanderbez and @robert-zaremba for the go-ahead that we've been waiting for cosmos/iavl, for a while. We've finished auditing celestiaorg/smt (publishing the official report in a few hours) and now we can go forth and improving cosmos/iavl and cosmos/cosmos-sdk/store/* performance.

odeke-em avatar Aug 16 '21 08:08 odeke-em

Given what @robert-zaremba stated and if you have bandwidth and interest, it would be helpful to solve this or at least figure out where the major bottlenecks are @odeke-em 🙏

alexanderbez avatar Aug 16 '21 13:08 alexanderbez

Given what @robert-zaremba stated and if you have bandwidth and interest, it would be helpful to solve this or at least figure out where the major bottlenecks are @odeke-em 🙏

Roger that @alexanderbez, we are on it and yes we do have the bandwidth to work on it :-)

odeke-em avatar Aug 17 '21 09:08 odeke-em

Eureka! After experimenting this afternoon and evening with various tricks, I think I have an idea that'll definitely slash x/crisis time down from 30+ minutes to like less than a minute or so.

Prognosis

In hundreds of profiles, we see that this code is super hot and the cause of that delay https://github.com/cosmos/cosmos-sdk/blob/00f987ac67981173898a66f9b6b2401ff74854ec/store/cachekv/store.go#L191-L196

per

ROUTINE ======================== github.com/cosmos/cosmos-sdk/store/cachekv.(*Store).dirtyItems in github.com/cosmos/[email protected]/store/cachekv/store.go
     1.42s     35.02s (flat, cum) 60.06% of Total
         .          .    211:func (store *Store) dirtyItems(start, end []byte) {
         .          .    212:	unsorted := make([]*kv.Pair, 0)
         .          .    213:
         .          .    214:        startStr, endStr := byteSliceToStr(start), byteSliceToStr(end)
         .          .    215:	n := len(store.unsortedCache)
     240ms      4.67s    216:        for key := range store.unsortedCache {
     430ms      430ms    217:            if key == "" {
         .          .    218:                println("blank key")
         .          .    219:            }
         .          .    220:        }
     740ms         5s    221:	for key := range store.unsortedCache {
      10ms     24.79s    222:		if isKeyInDomain(key, startStr, endStr) {
         .       10ms    223:			cacheValue := store.cache[key]
         .       10ms    224:			unsorted = append(unsorted, &kv.Pair{Key: []byte(key), Value: cacheValue.value})
         .          .    225:		}
         .          .    226:	}
         .          .    227:
         .          .    228:	if len(unsorted) == n { // This pattern allows the Go compiler to emit the map clearing idiom for the entire map.
         .          .    229:		for key := range store.unsortedCache {
         .          .    230:			delete(store.unsortedCache, key)
         .          .    231:		}
         .          .    232:	} else { // Otherwise, normally delete the unsorted keys from the map.
         .          .    233:		for _, kv := range unsorted {
         .       10ms    234:			delete(store.unsortedCache, byteSliceToStr(kv.Key))
         .          .    235:		}
         .          .    236:	}
         .          .    237:
         .       50ms    238:	sort.Slice(unsorted, func(i, j int) bool {
         .       50ms    239:		return bytes.Compare(unsorted[i].Key, unsorted[j].Key) < 0
         .          .    240:	})
         .          .    241:
         .          .    242:	for e := store.sortedCache.Front(); e != nil && len(unsorted) != 0; {
         .          .    243:		uitem := unsorted[0]
         .          .    244:		sitem := e.Value

and the culprit is the byte comparisons from invoking dbm.IsKeyInDomain(start, end, key []byte) per https://github.com/tendermint/tm-db/blob/edb35ebc2ea9327e0c0528d0d66bbf39a787461b/util.go#L37-L46, where it essentially for every single key performs a search in unsorted keys, so O(n) in the number of unsorted keys, but then O(klm*n) in the length of start, end and each of the keys, which leads to the huge 30min plus time spent looking for keys within the domain.

Target

We literally want to look through all the unsorted keys and find the values within the range [start, end)

Suggestion

Instead of that chaotic and quadratic search, we can bring this down to somewhat linear time by: a) Sorting all the store.unsortedCache in a slice SL b) Using a modified binary search to find where in SL we have the first keys >= start, eStart c) Using a modified range binary search to find where in SL we have the first value less than end, eEnd d) Iterating just using those indices eStart and eEnd ranged over SL to retrieve the dirty items in range

From a preliminary experiment of walking through the logic and plugging it into store.go I can confirm that this will return results in a very short time. I am going to catch some sleep for now, but I'll send a PR in the morning and also test out speed to reconfirm

odeke-em avatar Aug 24 '21 07:08 odeke-em

@odeke-em are you still interested in working on this?

tac0turtle avatar Jul 28 '22 21:07 tac0turtle

@odeke-em are you still interested in working on this?

I did mail this PR https://github.com/cosmos/cosmos-sdk/pull/10024 but I was told the store backend was gonna change in a couple of months to use smt, so I didn't do anything else aside from auditing smt and performance improvements on it, but waited for it to land here. Has the transition been completed?

odeke-em avatar Jul 28 '22 21:07 odeke-em

I did mail this PR #10024 but I was told the store backend was gonna change in a couple of months to use smt, so I didn't do anything else aside from auditing smt and performance improvements on it, but waited for it to land here. Has the transition been completed?

sorry, I think this is still relevant and should be done. The work on store v2 is taking longer than expected

tac0turtle avatar Jul 31 '22 11:07 tac0turtle

@elias-orijtech my awesome colleague from Orijtech Inc is already on this issue and we are going to work on it, I'll join in but just been super busy wrapping up quarter things but also on other security and performance issues and fixes.

odeke-em avatar Aug 03 '22 22:08 odeke-em

I've built gaiad, initialized a main net node, downloaded a Quicksync pruned snapshot, unpacked it. However, I'm not sure how to reproduce the error, or maybe I have to wait for something to settle. Here's my log (@odeke-em's p2p peer isn't available anymore):

$ ./build/gaiad start --cpu-profile cpu2.prof
8:08PM INF starting CPU profiler profile=cpu2.prof
8:08PM INF starting node with ABCI Tendermint in-process
8:08PM INF service start impl=multiAppConn module=proxy msg={}
8:08PM INF service start connection=query impl=localClient module=abci-client msg={}
8:08PM INF service start connection=snapshot impl=localClient module=abci-client msg={}
8:08PM INF service start connection=mempool impl=localClient module=abci-client msg={}
8:08PM INF service start connection=consensus impl=localClient module=abci-client msg={}
8:08PM INF service start impl=EventBus module=events msg={}
8:08PM INF service start impl=PubSub module=pubsub msg={}
8:08PM INF service start impl=IndexerService module=txindex msg={}
8:08PM INF ABCI Handshake App Info hash="�{�������Ԏ+s�-TKV\x03L|��6}��|��qp" height=11547643 module=consensus protocol-version=0 software-version=main-b021d36a89c9515d87032aa3ec5f4e9633b3604f
8:08PM INF ABCI Replay Blocks appHeight=11547643 module=consensus stateHeight=11547643 storeHeight=11547643
8:08PM INF Completed ABCI Handshake - Tendermint and App are synced appHash="�{�������Ԏ+s�-TKV\x03L|��6}��|��qp" appHeight=11547643 module=consensus
8:08PM INF Version info block=11 p2p=8 tendermint_version=v0.34.20
8:08PM INF This node is not a validator addr=70FB853D315BBA61F6AA880C2AF819CD40A3AEED module=consensus pubKey=ZgHBgP6svB/4NeA3fogGHxNhQlmSRXNxykcPc3cM518=
8:08PM INF P2P Node ID ID=328839a6bda98a9cd01a3fd5649e76fdb2bfb3b8 file=/Users/a/.gaia/config/node_key.json module=p2p
8:08PM INF Adding persistent peers addrs=[] module=p2p
8:08PM INF Adding unconditional peer ids ids=[] module=p2p
8:08PM INF Add our address to book addr={"id":"328839a6bda98a9cd01a3fd5649e76fdb2bfb3b8","ip":"0.0.0.0","port":26656} book=/Users/a/.gaia/config/addrbook.json module=p2p
8:08PM INF service start impl=Node msg={}
8:08PM INF Starting pprof server laddr=localhost:6060
8:08PM INF serve module=rpc-server msg={}
8:08PM INF service start impl="P2P Switch" module=p2p msg={}
8:08PM INF service start impl=StateSync module=statesync msg={}
8:08PM INF service start impl=PEX module=pex msg={}
8:08PM INF service start book=/Users/a/.gaia/config/addrbook.json impl=AddrBook module=p2p msg={}
8:08PM INF service start impl=BlockchainReactor module=blockchain msg={}
8:08PM INF service start impl=BlockPool module=blockchain msg={}
8:08PM INF service start impl=ConsensusReactor module=consensus msg={}
8:08PM INF Reactor  module=consensus waitSync=true
8:08PM INF service start impl=Evidence module=evidence msg={}
8:08PM INF Saving AddrBook to file book=/Users/a/.gaia/config/addrbook.json module=p2p size=0

My profile so far:

pprof001

which doesn't seem related.

elias-orijtech avatar Aug 06 '22 18:08 elias-orijtech

this issue is still present in gaiad-v7.0.3 - it's been over a year and i don't think it's been patched

failmode

visualbasic6 avatar Sep 26 '22 15:09 visualbasic6

I'd love to take another look. However, so far I failed to reproduce the issue. Can you provide me with instructions to replicate the slow startup?

elias-orijtech avatar Sep 26 '22 15:09 elias-orijtech

I believe this is fixed in the latest release of 0.45.x. Gaia has not updated to it yet though

tac0turtle avatar Sep 27 '22 23:09 tac0turtle

this showed up in Juno and this PR solved it (https://github.com/cosmos/cosmos-sdk/pull/12885), closing for now unless the newer versions still have the issue

tac0turtle avatar Sep 27 '22 23:09 tac0turtle