chia-blockchain icon indicating copy to clipboard operation
chia-blockchain copied to clipboard

Dummy txns created for Offers get stuck, causes slowness & DOUBLE_SPEND errors forever

Open joshpainter opened this issue 1 year ago • 10 comments

Summary

I've been hunting this one for months and I think I've finally cracked it - this will be long-winded but hopefully it will help others searching for similar issues.

Wut?

I've noticed for several months now that wallets inexplicably seem to get "slower" over time. The wallets that show this behavior are very active and they may create and/or accept several offers within the same block. They may be contending with other users trying to accept the same offer at the same time, as in the case with mint events. Deviation indexes of 20,000 and above are common for these larger, active wallets.

At first I thought it might be related to the deviation index. It seemed to get slower at accepting offers as the deviation index climbed. But the strange thing is that I could delete the wallet sqlite database file and let it resync, and then the wallet would be nice and fast again, even with a very high deviation index. So this meant that the derivation index itself was not the root cause!

Something must be accruing, or "leaking," over time. Let's take a look through the wallet database! I ran across some odd looking records in the transaction_record table:

image What is this?!? Several transaction records with weird blank values, wallet_id of 0 (doesn't exist), etc. After lots of spelunking, I found this bit of code:

https://github.com/Chia-Network/chia-blockchain/blob/f29eb44ffc14b79b9ad8ee28fb8d9de0c87514ba/chia/wallet/trade_manager.py#L769-L787

When an offer is accepted, this "dummy" txn is created. Once the offer is confirmed, it appears that these dummy txns get cleaned up. Here's the problems:

  • These dummies are not cleaned up if the offer is canceled.
  • They are also not cleaned up if the "Delete Unconfirmed Transactions" is used.
  • Failed offers (another user snipes your accept) mean more orphaned dummy txns in this table.
  • This means they build up over time

How does this affect performance?

It appears that these dummy txns are submitted to the full node over and over, resulting in DOUBLE_SPEND errors and pre_validate_spendbundle warnings in the log.

image

As these orphan dummies build up over time, the wallet gets slower and slower at accepting offers and sending txns because everytime it does, it goes through that huge list of stuck orphan dummies and tries to submit them to full node again. In fact, full node will eventually try to ban the local wallet for this nonsense!

How to see if you are affected by this issue?

Run this query in your wallet db:

SELECT * FROM transaction_record WHERE wallet_id = 0

If you have no pending offers or transactions, but you still get results from this query, it means you have orphaned dummies.

How to temporarily fix this issue?

Use this statement to clean them up manually (log out of wallet in UI/CLI first):

DELETE FROM transaction_record WHERE wallet_id = 0

The more dummy records you cleaned up, the faster your wallet should feel after you next start it up!

I'm not sure about a longer term fix or I would have submitted a PR, but I think I'd start by just including the wallet_id when making these dummy txns instead of setting it to zero:

https://github.com/Chia-Network/chia-blockchain/blob/f29eb44ffc14b79b9ad8ee28fb8d9de0c87514ba/chia/wallet/trade_manager.py#L781

This would mean that they would naturally get cleaned up by delete_unconfirmed_transactions:

https://github.com/Chia-Network/chia-blockchain/blob/f29eb44ffc14b79b9ad8ee28fb8d9de0c87514ba/chia/wallet/wallet_transaction_store.py#L356

Longer term, maybe adding some logic to trade_store.set_status so that when offers are canceled or failed, these dummy txns get cleaned up would be nice: https://github.com/Chia-Network/chia-blockchain/blob/f29eb44ffc14b79b9ad8ee28fb8d9de0c87514ba/chia/wallet/trading/trade_store.py#L177

Fin

Anyway, hopefully this detail helps this bug get fixed extra-quick, cause I have a sneaky suspicion that it has been one of the main causes of wallet slowness and DOUBLE_SPEND errors in logs!

Thank you for attending my New Issue Talk.

joshpainter avatar Mar 01 '23 19:03 joshpainter

Actually I think we might be able to fix this whole mess, including retroactively for any users suffering from this now, with This One Simple Fix (bolded):

await conn.execute("DELETE FROM transaction_record WHERE confirmed=0 AND (wallet_id=? OR wallet_id=0)", (wallet_id,))

https://github.com/Chia-Network/chia-blockchain/blob/f29eb44ffc14b79b9ad8ee28fb8d9de0c87514ba/chia/wallet/wallet_transaction_store.py#L356

joshpainter avatar Mar 01 '23 19:03 joshpainter

@joshpainter thanks for reporting and well done doing a deep dive into the wallet code. Indeed, what you propose with set_status is roughly what we're currently working on. Hopefully can resolve this fairly quickly.

trepca avatar Mar 01 '23 20:03 trepca

@joshpainter can you try https://github.com/Chia-Network/chia-blockchain/pull/14722 and see if it fixes the problem

trepca avatar Mar 06 '23 15:03 trepca

@trepca Unfortunately it does not, but it does change things! Now, the offer files stay "pending" forever even after I've accepted them and they show as confirmed transactions. The 'dummy' records I mention above are also still created and left in the transactions table and never cleaned up. Verify with below SQL after accepting and confirming offers:

SELECT * FROM transaction_record WHERE wallet_id = 0

I think we went backwards with this one. 😢

joshpainter avatar Mar 07 '23 13:03 joshpainter

@joshpainter can you try #14722 and see if it fixes the problem

Hey @trepca it looks you fixed these orphaned records in the latest 1.7.1-rc2! I no longer see the orphaned records using the above SQL after accepting offers.

However, now there is another weird problem that might be related to your fix. If I try to create a second offer that uses the same amount of assets as an outstanding pending offer, I get a screen pop-up in the UI:

image

I have plenty of confirmed balance of those assets broken up into lots of smaller coins so it should be able to choose a different coin than the outstanding pending offer. If I "Proceed" it doesn't seem that the new offer is created. It seems like it is keying off of the outstanding offer amounts instead of the underlying locked coins?

Let me know if I should make a new issue for this. Thanks for your work on this!

joshpainter avatar Mar 19 '23 03:03 joshpainter

@joshpainter interesting, do you see any WARNING or ERROR entries in logs?

trepca avatar Mar 20 '23 15:03 trepca

@joshpainter Thanks for bringing that up. The dialog is new to 1.7.1, but in the RC2 build it was a bit overzealous in prompting the user to close out possibly "conflicting" offers when in fact there is no conflict. A fix for this has been merged and will be out in the next RC build.

paninaro avatar Mar 20 '23 17:03 paninaro

Thank you both, I'm now on 1.7.1 and I'm still seeing quite a bit of strangeness.

I've created a new wallet to take advantage of the new reuse_public_key_for_change setting. This seems to work well - at least we can remove a large deviation index as the culprit!

However, I still have weird issues when I try to accept multiple offers in the same block. Sometimes one of them will confirm but others will stay "pending" forever (until I cancel them). Other times it won't even let me accept an offer because I have another offer pending with the same assets - it thinks I'm trying to accept a duplicate offer, but in fact they are different offers - the amounts and assets are just the same. Is it using asset ID and asset amount as a unique key for offers somewhere maybe?!?

I no longer see the "dummy" records in the trade_records table, but I see all the other stuck offers. I've tried to manually update their statuses, or delete them entirely, but that just seems to mess up other stuff.

Are there currently any test cases around accepting multiple offers in the same block, specifically offers with the same asset ID and amounts? I think they would reveal all of these issues pretty quickly. Thanks!

joshpainter avatar Mar 24 '23 08:03 joshpainter

Some more detail that may or may not help in tracking this down:

  • I tried using the new set_wallet_resync_on_startup and observed the new setting get applied in config.yaml. I restarted wallet and it definitely seemed like it took longer than normal, but it did not seem to fix the "stuck" offers
  • I have configured automatically_add_unknown_cats to true and connect_to_unknown_peers to false.
  • Running full node with local network set for exempt_peer_networks
  • When I have these "stuck" offers, my log file is filled with DOUBLE_SPEND errors but not much else.
  • I have to just delete the whole wallet db and resync to get out of this state - was hoping set_wallet_resync_on_startup would do this for me but it doesn't seem to clear offers/txns?
  • I'm using a mix of the UI and the RPC to accept/create offers.

joshpainter avatar Mar 24 '23 09:03 joshpainter

Same issues here. Please check your derivation index. I have found that creating offers, even if they are never beeing executed on chain increase the wallets derivation index. The issue of this is that if your wallet is at a high derivation index, eventhough no transactions have been made. As soon as one offer or transaction is beeing executed, this high derivation index is written to the blockchain.

Furthermore, at a couple thousand derivation indexes, the Wallet stats getting slow. Soon later, it starts bugging out with transactions such as you mentioned. At around 100k-200k it becomes dead and the funds have to be recovered with an offline signer.

As long as no transaction has been executed at this high derivation index, you can recover the derivation index by deleting the Wallet database and resyncing it (since no transactions have been made on chain) Set_wallet_resync_on_startup does not help in that case, as long as it keeps the derivation index or if offers have been executed, canceled on chain or you have sent transactions.

KryptomineCH avatar Mar 16 '24 12:03 KryptomineCH