chia-blockchain
chia-blockchain copied to clipboard
Pooling Wallet - one possible root cause for syncing issues
The long_sync
operation of a big wallet, when the wallet has many pool_state_transitions
, is very inefficient.
I've tested sync operations with a testnet wallet with ~80k coins and assumable ~70k pool_state_transitions.
Process Flow
Here is a simplified process flow of how I understood the relevant parts of the sync.
When the long sync
begins with the Pooling Wallet sync, this loop is executed:
https://github.com/Chia-Network/chia-blockchain/blob/244a9dcc913e72353c56e47ec4e1f8c03466afaf/chia/wallet/wallet_state_manager.py#L1087
The inefficiency comes from the call of apply_state_transition
function.
https://github.com/Chia-Network/chia-blockchain/blob/244a9dcc913e72353c56e47ec4e1f8c03466afaf/chia/wallet/wallet_state_manager.py#L1091
Within this function we first call another function to add the spend to the db: https://github.com/Chia-Network/chia-blockchain/blob/244a9dcc913e72353c56e47ec4e1f8c03466afaf/chia/pools/pool_wallet.py#L284
BEGIN add_spend
Before the spend is actually added to the db, the following two queries are executed: https://github.com/Chia-Network/chia-blockchain/blob/244a9dcc913e72353c56e47ec4e1f8c03466afaf/chia/wallet/wallet_pool_store.py#L47-L52 https://github.com/Chia-Network/chia-blockchain/blob/244a9dcc913e72353c56e47ec4e1f8c03466afaf/chia/wallet/wallet_pool_store.py#L61-L64
END add_spend
Then we enter this loop https://github.com/Chia-Network/chia-blockchain/blob/244a9dcc913e72353c56e47ec4e1f8c03466afaf/chia/pools/pool_wallet.py#L288-L297
Where we basically select all known spends from the pooling wallet: https://github.com/Chia-Network/chia-blockchain/blob/244a9dcc913e72353c56e47ec4e1f8c03466afaf/chia/pools/pool_wallet.py#L290
To get all spends, this SQL query is executed: https://github.com/Chia-Network/chia-blockchain/blob/244a9dcc913e72353c56e47ec4e1f8c03466afaf/chia/wallet/wallet_pool_store.py#L97-L98
Then we process the next coin, starting the whole loop again.
Possible Problems
-
For every coin, we select the whole
pool_state_transitions
table 3 times.-
To get the transition_index from the wallet, starting here: https://github.com/Chia-Network/chia-blockchain/blob/244a9dcc913e72353c56e47ec4e1f8c03466afaf/chia/wallet/wallet_pool_store.py#L47 What is the reason for this query? If we only want to get the highest transition_index for the given_wallet, then we could probably optimize the query.
-
To get the amount of transitions for a given wallet, height and spend, starting here: https://github.com/Chia-Network/chia-blockchain/blob/244a9dcc913e72353c56e47ec4e1f8c03466afaf/chia/wallet/wallet_pool_store.py#L61 I guess this query checks if the current spend should be added to the database. The check is done by selecting rows filtered by
wallet_id
andheight
andcoin_spend
. The PK of the tablepool_state_transitions
istransition_index, wallet_id
and there are no other indexes. -
To get all heights and coin_spends from a given wallet, starting here: https://github.com/Chia-Network/chia-blockchain/blob/244a9dcc913e72353c56e47ec4e1f8c03466afaf/chia/wallet/wallet_pool_store.py#L97
-
-
So, every time we loop through the function
apply_state_transition
, we most probably add a new spend and select all old spends plus the new spends three times (2 times the whole table, once the rowcount), basically reading the whole table, like explained above. This adds up real quick and leads to a lot of redundant, unnecessary(?) work.
I don't really understand the design decissions, why the sync, in regards of the SQLite DB part, is implemented that way, but I'm pretty sure this very problem could be mitigated by, at least not reading the growing whole table again and again.
Thank you for your analysis. Before we continue the discussion, I would like to get on the same page on what is the use case you are describing. We did not optimize the pool wallet for 70k state transitions. We have optimized for many standard transactions, not pool wallet singletons.
Do you have a farmer that has won 70k blocks? Or have you changed your pool on chain 70k times? Why exactly do you have so many transactions for the pool wallet?
The data is from the keys we used to test our pool on testnet, farming for a very long time and with a decent amount of testnet plots. So once joined the pool all other events are from block wins.
From a backup db we know, that the last known number of pool state transitions is 28370. As we weren't able to sync the wallet any more (gave up after 2 days), we switched to new keys.
I actually had time for a few days to review the wallet syncing from a DB perspective, found also a few minor things,
which I will create PRs for, eventually.
If I find the time, I will try to get some data points, how long it takes and how it slowes down after every 500 transitions or so.
I see, so it's testnet keys. The reason this was designed this way, was to reduce complexity and make it as simple as possible. We did not consider the case of winning thousands of farming rewards, since it's a rare case.
If there are some relatively simple ways to optimize it, like not re-reading the whole table, we can make these changes.
I had the feeling, that this is an extreme edge case, but I still think this could be avoided. Is such a state transition only relevant when farming with pools or also when using nft plot format and claiming self farmed rewards? At least real big farmers with > 10 PiB which are farming with pools may run into the same problem after some time.
This are the SQL statement executed for every loop:
15:44:45.401262 SELECT height, coin_spend FROM pool_state_transitions WHERE wallet_id=? ORDER BY transition_index
15:44:45.410667 SELECT transition_index, height, coin_spend FROM pool_state_transitions WHERE wallet_id=? ORDER BY transition_index DESC LIMIT 1
15:44:45.412169 SELECT COUNT(*) FROM pool_state_transitions WHERE wallet_id=? AND height=? AND coin_spend=?
15:44:45.412477 INSERT OR IGNORE INTO pool_state_transitions VALUES (?, ?, ?, ?)
15:44:45.412934 SELECT height, coin_spend FROM pool_state_transitions WHERE wallet_id=? ORDER BY transition_index
15:44:45.422234 SELECT height, coin_spend FROM pool_state_transitions WHERE wallet_id=? ORDER BY transition_index
15:44:45.453788 SELECT height, coin_spend FROM pool_state_transitions WHERE wallet_id=? ORDER BY transition_index
15:44:45.549240 SELECT * from coin_record WHERE coin_name=?
15:44:45.549638 SELECT * from coin_record WHERE coin_name=?
15:44:45.549939 INSERT OR FAIL INTO coin_record VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
15:44:45.550090 SELECT last_insert_rowid()
15:44:45.550250 SELECT MIN(derivation_index) FROM derivation_paths WHERE used=0 AND hardened=0;
15:44:45.550484 SELECT MAX(derivation_index) FROM derivation_paths WHERE wallet_id=?
15:44:45.550930 UPDATE coin_record SET spent_height=?,spent=? WHERE coin_name=?
15:44:45.551016 SELECT last_insert_rowid()
The pool_state_transition table is even 4 times selected, not 3 times. I missed the get_tip
function which also does this.
And another thing, the comment on apply_state_transition
function mentions the following:
https://github.com/Chia-Network/chia-blockchain/blob/7c9fb654a3563e69c54848d137859856c8566587/chia/pools/pool_wallet.py#L263-L265
But the whole apply process runs in a single transaction, which also blows up the wal file and of course, when beeing stopped or crashed it has to start from 0 again.
This applies to pool farming, as well as self farming with plot NFTs.
Yes it is extremely inefficient. Could you provide timings on each of those queries and which ones took a long time?
We might be able to fix it by adding a simple cache somewhere. However, I do not want to make extensive and potentially risky changes, since it's not an issue that will affect too many people, and adding a bug would affect many people
Yes it is extremely inefficient. Could you provide timings on each of those queries and which ones took a long time?
From what I see, the inefficiency comes mainly (or only) from querying the growing pool_state_transition
table.
Short report
TIME: {amount of seconds to loop 200 times}, NEW_COIN_ITERATIONS: { overall counter }
The following metrics show the avg and sum amount of time spent, measured before and after this for loop: https://github.com/Chia-Network/chia-blockchain/blob/7c9fb654a3563e69c54848d137859856c8566587/chia/pools/pool_wallet.py#L289
AVG_OVER_LAST_200_ITERS: {average in sec for the last 200 times} SUM_OVER_LAST_200_ITERS: {sum in sec for the last 200 times}
TIME: 15.338013887405396, NEW_COIN_ITERATIONS: 200
AVG_OVER_LAST_200_ITERS: 0.009473387002944946
SUM_OVER_LAST_200_ITERS: 1.8946774005889893
TIME: 18.56275773048401, NEW_COIN_ITERATIONS: 400
AVG_OVER_LAST_200_ITERS: 0.016466482877731323
SUM_OVER_LAST_200_ITERS: 3.2932965755462646
TIME: 21.860878229141235, NEW_COIN_ITERATIONS: 600
AVG_OVER_LAST_200_ITERS: 0.023045202493667604
SUM_OVER_LAST_200_ITERS: 4.6090404987335205
TIME: 27.917426347732544, NEW_COIN_ITERATIONS: 800
AVG_OVER_LAST_200_ITERS: 0.030841059684753418
SUM_OVER_LAST_200_ITERS: 6.168211936950684
TIME: 34.32459545135498, NEW_COIN_ITERATIONS: 1000
AVG_OVER_LAST_200_ITERS: 0.03513239741325378
SUM_OVER_LAST_200_ITERS: 7.026479482650757
TIME: 32.361149072647095, NEW_COIN_ITERATIONS: 1200
AVG_OVER_LAST_200_ITERS: 0.042580795288085935
SUM_OVER_LAST_200_ITERS: 8.516159057617188
TIME: 35.4493567943573, NEW_COIN_ITERATIONS: 1400
AVG_OVER_LAST_200_ITERS: 0.04850999116897583
SUM_OVER_LAST_200_ITERS: 9.701998233795166
TIME: 39.31749510765076, NEW_COIN_ITERATIONS: 1600
AVG_OVER_LAST_200_ITERS: 0.056016595363616944
SUM_OVER_LAST_200_ITERS: 11.203319072723389
TIME: 42.56333684921265, NEW_COIN_ITERATIONS: 1800
AVG_OVER_LAST_200_ITERS: 0.06123473525047302
SUM_OVER_LAST_200_ITERS: 12.246947050094604
TIME: 45.803821325302124, NEW_COIN_ITERATIONS: 2000
AVG_OVER_LAST_200_ITERS: 0.06745408654212952
SUM_OVER_LAST_200_ITERS: 13.490817308425903
TIME: 49.528172731399536, NEW_COIN_ITERATIONS: 2200
AVG_OVER_LAST_200_ITERS: 0.07036285161972046
SUM_OVER_LAST_200_ITERS: 14.072570323944092
TIME: 52.968520641326904, NEW_COIN_ITERATIONS: 2400
AVG_OVER_LAST_200_ITERS: 0.07603522300720215
SUM_OVER_LAST_200_ITERS: 15.20704460144043
TIME: 56.227105140686035, NEW_COIN_ITERATIONS: 2600
AVG_OVER_LAST_200_ITERS: 0.08831184983253479
SUM_OVER_LAST_200_ITERS: 17.662369966506958
TIME: 59.85309290885925, NEW_COIN_ITERATIONS: 2800
AVG_OVER_LAST_200_ITERS: 0.09517837405204772
SUM_OVER_LAST_200_ITERS: 19.035674810409546
TIME: 63.6175377368927, NEW_COIN_ITERATIONS: 3000
AVG_OVER_LAST_200_ITERS: 0.10020145177841186
SUM_OVER_LAST_200_ITERS: 20.040290355682373
TIME: 68.48028826713562, NEW_COIN_ITERATIONS: 3200
AVG_OVER_LAST_200_ITERS: 0.1073816955089569
SUM_OVER_LAST_200_ITERS: 21.476339101791382
TIME: 70.36668515205383, NEW_COIN_ITERATIONS: 3400
AVG_OVER_LAST_200_ITERS: 0.11387142539024353
SUM_OVER_LAST_200_ITERS: 22.774285078048706
TIME: 74.68989276885986, NEW_COIN_ITERATIONS: 3600
AVG_OVER_LAST_200_ITERS: 0.12173425197601319
SUM_OVER_LAST_200_ITERS: 24.346850395202637
TIME: 77.73890948295593, NEW_COIN_ITERATIONS: 3800
For instance, the first 200 iterations:
TIME: 15.338013887405396, NEW_COIN_ITERATIONS: 200
AVG_OVER_LAST_200_ITERS: 0.009473387002944946
SUM_OVER_LAST_200_ITERS: 1.8946774005889893
You could read as, it took about 15.3s overall. The for loop took on avg. 0.009s and overall 1.894s, which is about 12.35% of overall.
Between iteration 3600 and 3800:
AVG_OVER_LAST_200_ITERS: 0.12173425197601319
SUM_OVER_LAST_200_ITERS: 24.346850395202637
TIME: 77.73890948295593, NEW_COIN_ITERATIONS: 3800
The processing of these 200 iterations took 77.7s. The for loop took on avg. 0.121s and overall 24.34s, which is about 31,31% of overall.
I have a test wallet on testnet with many claims and wins, and it is days to sync - I believe it is likely partially explained by this analysis. Assigning to Almog to take a look