cosmos-sdk
cosmos-sdk copied to clipboard
x/crisis: asserting crisis invariants distribution/can-withdraw takes a very long time
From my profiling and sleuthing, I just noticed that starting a node using gaiad took more than 30 minutes and majority of the time was spent on module=x/crisis name=distribution/can-withdraw per
$ gaiad start --p2p.seeds "[email protected]:26657"
12:34AM INF starting ABCI with Tendermint
12:34AM INF Starting multiAppConn service impl=multiAppConn module=proxy
12:34AM INF Starting localClient service connection=query impl=localClient module=abci-client
12:34AM INF Starting localClient service connection=snapshot impl=localClient module=abci-client
12:34AM INF Starting localClient service connection=mempool impl=localClient module=abci-client
12:34AM INF Starting localClient service connection=consensus impl=localClient module=abci-client
12:34AM INF Starting EventBus service impl=EventBus module=events
12:34AM INF Starting PubSub service impl=PubSub module=pubsub
12:34AM INF Starting IndexerService service impl=IndexerService module=txindex
12:34AM INF ABCI Handshake App Info hash= height=0 module=consensus protocol-version=0 software-version=
12:34AM INF ABCI Replay Blocks appHeight=0 module=consensus stateHeight=0 storeHeight=0
12:34AM INF asserting crisis invariants inv=0/12 module=x/crisis name=bank/nonnegative-outstanding
12:34AM INF asserting crisis invariants inv=1/12 module=x/crisis name=bank/total-supply
12:34AM INF asserting crisis invariants inv=2/12 module=x/crisis name=distribution/nonnegative-outstanding
12:34AM INF asserting crisis invariants inv=3/12 module=x/crisis name=distribution/can-withdraw
1:07AM INF asserting crisis invariants inv=4/12 module=x/crisis name=distribution/reference-count
1:07AM INF asserting crisis invariants inv=5/12 module=x/crisis name=distribution/module-account
1:07AM INF asserting crisis invariants inv=6/12 module=x/crisis name=liquidity/escrow-amount
1:07AM INF asserting crisis invariants inv=7/12 module=x/crisis name=staking/module-accounts
1:07AM INF asserting crisis invariants inv=8/12 module=x/crisis name=staking/nonnegative-power
1:07AM INF asserting crisis invariants inv=9/12 module=x/crisis name=staking/positive-delegation
1:07AM INF asserting crisis invariants inv=10/12 module=x/crisis name=staking/delegator-shares
1:08AM INF asserting crisis invariants inv=11/12 module=x/crisis name=gov/module-account
1:08AM INF asserted all invariants duration=2065579.553016 height=5200791 module=x/crisis
1:08AM INF created new capability module=ibc name=ports/transfer
1:08AM INF port binded module=x/ibc/port port=transfer
1:08AM INF claimed capability capability=1 module=transfer name=ports/transfer
In late February/March 2021 we had brought down start time to a very short period, but this is a huge increase. We need to investigate what's going wrong.
I've examined live traffic for the past 1.5 hours and a pain point seems to be the expected

I know there has been talk of replacing the code in iavl, which is why my team and I held from from optimizing it, but it's been 6+ months and perhaps it is time for us to dive in deep and figure out what could be improved.
/cc @cuonglm @kirbyquerby
For Admin Use
- [x] Not duplicate issue
- [ ] Appropriate labels applied
- [ ] Appropriate contributors tagged
- [ ] Contributor assigned/self-assigned
@odeke-em which gaia version did you use?
Ooooo yes, this is a very well known issue to the point we encourage operators to start with --skip-crisis-invariants. But that really defeats the purpose of having x/crisis as a module/function, so we should really strive to improve the performance of this. But I imagine not only the problem but also the improvements will be multi-faceted.
Also, it is true we're moving to SMT in place of IAVL, at that work is very active. I don't know estimated completion times (cc @aaronc), so based on the provided benchmark it seems to be completely IAVL related. Not sure if it's worth digging into this...?
It will take few more months to have ADR-40 implemented and merged.
IAVL and current store package won't be deprecated immediately and we will support both SMT and IAVL in the SDK for few more releases.
Awesome, thank you @alexanderbez and @robert-zaremba for the go-ahead that we've been waiting for cosmos/iavl, for a while. We've finished auditing celestiaorg/smt (publishing the official report in a few hours) and now we can go forth and improving cosmos/iavl and cosmos/cosmos-sdk/store/* performance.
Given what @robert-zaremba stated and if you have bandwidth and interest, it would be helpful to solve this or at least figure out where the major bottlenecks are @odeke-em 🙏
Given what @robert-zaremba stated and if you have bandwidth and interest, it would be helpful to solve this or at least figure out where the major bottlenecks are @odeke-em 🙏
Roger that @alexanderbez, we are on it and yes we do have the bandwidth to work on it :-)
Eureka! After experimenting this afternoon and evening with various tricks, I think I have an idea that'll definitely slash x/crisis time down from 30+ minutes to like less than a minute or so.
Prognosis
In hundreds of profiles, we see that this code is super hot and the cause of that delay https://github.com/cosmos/cosmos-sdk/blob/00f987ac67981173898a66f9b6b2401ff74854ec/store/cachekv/store.go#L191-L196
per
ROUTINE ======================== github.com/cosmos/cosmos-sdk/store/cachekv.(*Store).dirtyItems in github.com/cosmos/[email protected]/store/cachekv/store.go
1.42s 35.02s (flat, cum) 60.06% of Total
. . 211:func (store *Store) dirtyItems(start, end []byte) {
. . 212: unsorted := make([]*kv.Pair, 0)
. . 213:
. . 214: startStr, endStr := byteSliceToStr(start), byteSliceToStr(end)
. . 215: n := len(store.unsortedCache)
240ms 4.67s 216: for key := range store.unsortedCache {
430ms 430ms 217: if key == "" {
. . 218: println("blank key")
. . 219: }
. . 220: }
740ms 5s 221: for key := range store.unsortedCache {
10ms 24.79s 222: if isKeyInDomain(key, startStr, endStr) {
. 10ms 223: cacheValue := store.cache[key]
. 10ms 224: unsorted = append(unsorted, &kv.Pair{Key: []byte(key), Value: cacheValue.value})
. . 225: }
. . 226: }
. . 227:
. . 228: if len(unsorted) == n { // This pattern allows the Go compiler to emit the map clearing idiom for the entire map.
. . 229: for key := range store.unsortedCache {
. . 230: delete(store.unsortedCache, key)
. . 231: }
. . 232: } else { // Otherwise, normally delete the unsorted keys from the map.
. . 233: for _, kv := range unsorted {
. 10ms 234: delete(store.unsortedCache, byteSliceToStr(kv.Key))
. . 235: }
. . 236: }
. . 237:
. 50ms 238: sort.Slice(unsorted, func(i, j int) bool {
. 50ms 239: return bytes.Compare(unsorted[i].Key, unsorted[j].Key) < 0
. . 240: })
. . 241:
. . 242: for e := store.sortedCache.Front(); e != nil && len(unsorted) != 0; {
. . 243: uitem := unsorted[0]
. . 244: sitem := e.Value
and the culprit is the byte comparisons from invoking dbm.IsKeyInDomain(start, end, key []byte) per https://github.com/tendermint/tm-db/blob/edb35ebc2ea9327e0c0528d0d66bbf39a787461b/util.go#L37-L46, where it essentially for every single key performs a search in unsorted keys, so O(n) in the number of unsorted keys, but then O(klm*n) in the length of start, end and each of the keys, which leads to the huge 30min plus time spent looking for keys within the domain.
Target
We literally want to look through all the unsorted keys and find the values within the range [start, end)
Suggestion
Instead of that chaotic and quadratic search, we can bring this down to somewhat linear time by: a) Sorting all the store.unsortedCache in a slice SL b) Using a modified binary search to find where in SL we have the first keys >= start, eStart c) Using a modified range binary search to find where in SL we have the first value less than end, eEnd d) Iterating just using those indices eStart and eEnd ranged over SL to retrieve the dirty items in range
From a preliminary experiment of walking through the logic and plugging it into store.go I can confirm that this will return results in a very short time. I am going to catch some sleep for now, but I'll send a PR in the morning and also test out speed to reconfirm
@odeke-em are you still interested in working on this?
@odeke-em are you still interested in working on this?
I did mail this PR https://github.com/cosmos/cosmos-sdk/pull/10024 but I was told the store backend was gonna change in a couple of months to use smt, so I didn't do anything else aside from auditing smt and performance improvements on it, but waited for it to land here. Has the transition been completed?
I did mail this PR #10024 but I was told the store backend was gonna change in a couple of months to use smt, so I didn't do anything else aside from auditing smt and performance improvements on it, but waited for it to land here. Has the transition been completed?
sorry, I think this is still relevant and should be done. The work on store v2 is taking longer than expected
@elias-orijtech my awesome colleague from Orijtech Inc is already on this issue and we are going to work on it, I'll join in but just been super busy wrapping up quarter things but also on other security and performance issues and fixes.
I've built gaiad, initialized a main net node, downloaded a Quicksync pruned snapshot, unpacked it. However, I'm not sure how to reproduce the error, or maybe I have to wait for something to settle. Here's my log (@odeke-em's p2p peer isn't available anymore):
$ ./build/gaiad start --cpu-profile cpu2.prof
8:08PM INF starting CPU profiler profile=cpu2.prof
8:08PM INF starting node with ABCI Tendermint in-process
8:08PM INF service start impl=multiAppConn module=proxy msg={}
8:08PM INF service start connection=query impl=localClient module=abci-client msg={}
8:08PM INF service start connection=snapshot impl=localClient module=abci-client msg={}
8:08PM INF service start connection=mempool impl=localClient module=abci-client msg={}
8:08PM INF service start connection=consensus impl=localClient module=abci-client msg={}
8:08PM INF service start impl=EventBus module=events msg={}
8:08PM INF service start impl=PubSub module=pubsub msg={}
8:08PM INF service start impl=IndexerService module=txindex msg={}
8:08PM INF ABCI Handshake App Info hash="�{�������Ԏ+s�-TKV\x03L|��6}��|��qp" height=11547643 module=consensus protocol-version=0 software-version=main-b021d36a89c9515d87032aa3ec5f4e9633b3604f
8:08PM INF ABCI Replay Blocks appHeight=11547643 module=consensus stateHeight=11547643 storeHeight=11547643
8:08PM INF Completed ABCI Handshake - Tendermint and App are synced appHash="�{�������Ԏ+s�-TKV\x03L|��6}��|��qp" appHeight=11547643 module=consensus
8:08PM INF Version info block=11 p2p=8 tendermint_version=v0.34.20
8:08PM INF This node is not a validator addr=70FB853D315BBA61F6AA880C2AF819CD40A3AEED module=consensus pubKey=ZgHBgP6svB/4NeA3fogGHxNhQlmSRXNxykcPc3cM518=
8:08PM INF P2P Node ID ID=328839a6bda98a9cd01a3fd5649e76fdb2bfb3b8 file=/Users/a/.gaia/config/node_key.json module=p2p
8:08PM INF Adding persistent peers addrs=[] module=p2p
8:08PM INF Adding unconditional peer ids ids=[] module=p2p
8:08PM INF Add our address to book addr={"id":"328839a6bda98a9cd01a3fd5649e76fdb2bfb3b8","ip":"0.0.0.0","port":26656} book=/Users/a/.gaia/config/addrbook.json module=p2p
8:08PM INF service start impl=Node msg={}
8:08PM INF Starting pprof server laddr=localhost:6060
8:08PM INF serve module=rpc-server msg={}
8:08PM INF service start impl="P2P Switch" module=p2p msg={}
8:08PM INF service start impl=StateSync module=statesync msg={}
8:08PM INF service start impl=PEX module=pex msg={}
8:08PM INF service start book=/Users/a/.gaia/config/addrbook.json impl=AddrBook module=p2p msg={}
8:08PM INF service start impl=BlockchainReactor module=blockchain msg={}
8:08PM INF service start impl=BlockPool module=blockchain msg={}
8:08PM INF service start impl=ConsensusReactor module=consensus msg={}
8:08PM INF Reactor module=consensus waitSync=true
8:08PM INF service start impl=Evidence module=evidence msg={}
8:08PM INF Saving AddrBook to file book=/Users/a/.gaia/config/addrbook.json module=p2p size=0
My profile so far:
which doesn't seem related.
this issue is still present in gaiad-v7.0.3 - it's been over a year and i don't think it's been patched

I'd love to take another look. However, so far I failed to reproduce the issue. Can you provide me with instructions to replicate the slow startup?
I believe this is fixed in the latest release of 0.45.x. Gaia has not updated to it yet though
this showed up in Juno and this PR solved it (https://github.com/cosmos/cosmos-sdk/pull/12885), closing for now unless the newer versions still have the issue