lightning
lightning copied to clipboard
CLN not properly detecting confirmed transactions
Issue and Steps to Reproduce
From time to time some utxos are not detected by CLN and we have to restart CLN with rescan=n (with n at least few hundreds) to recover the utxos of the last few days. It's very hard to mitigate the issue, because rescan is very slow (few seconds per block) and we don't really know if transactions confirmed in older blocks were missed. In fact there are two issues:
- UTXOS are not detected
- CLN cannot make any transaction until the rescan is finished.
Stupid question: While doing this, before restarting you are sure that the block heigh >= block where the tx is confirmed?
Stupid question: While doing this, before restarting you are sure that the block heigh >= block where the tx is confirmed?
I didn't check but there were transactions considered as confirmed by CLN which were confirmed after the not detected txs
Is there any progress on that? Is this related to #6929 ?
Let's add a bit of details to this. Here are a couple of questions to get a base understanding of what happens.
- [ ] Is there any common denominator in the missing UTXOs? Are they commonly from incoming on-chain payments, change outputs, or channel close outputs?
- [ ] Is the node fully synced with the blockchain? If not, is the UTXO-creating TX already processed by CLN?
- [ ] Have you checked in the
outputstable whether the output exists? Theprev_out_hashis big endian, so check against the byte-reversed string to see if it matches with the blockchain. If there is an entry, it might have been reserved or spent in the meantime which would cause it to not appear in thelistfundsoutputs. - [ ] Can you share the logs emitted at the time the block confirming the UTXO-creating transaction was processed? It emits an
Owning output [hash]:[idx]message when it detects an output it owns. If in doubt feel free to share more logs, we can narrow it down to the essential.
* [ ] Is there any common denominator in the missing UTXOs? Are they commonly from incoming on-chain payments, change outputs, or channel close outputs?
I think that it's more change output
* [ ] Is the node fully synced with the blockchain? If not, is the UTXO-creating TX already processed by CLN?
Yes. The node is fully synced and it even "knows" tx confirmed in blocks coming after the block including the problematic tx
* [ ] Have you checked in the `outputs` table whether the output exists? The `prev_out_hash` is big endian, so check against the byte-reversed string to see if it matches with the blockchain. If there is an entry, it might have been reserved or spent in the meantime which would cause it to not appear in the `listfunds` outputs.
I used only cli commands but I added the spent parameter to listfunds so it has to appear even if it was spent and in almost all the cases, the utxos not detected by cln were not spent (nor reserved).
* [ ] Can you share the logs emitted at the time the block confirming the UTXO-creating transaction was processed? It emits an `Owning output [hash]:[idx]` message when it detects an output it owns. If in doubt feel free to share more logs, we can narrow it down to the essential.
I will try to share privately with you the next time I see the problem
This should be addressed in #7567
@cdecker In order to verify that the issue is indeed fixed, can we query the db for all the tracked unconfirmed? Then we can check if some of them are confirmed. We can also use this method to fix old missed txs.
I'm afraid that may not be as simple. The reason being that we add outputs to our wallet as unconfirmed only if the transaction was originally sent by us. This would be the case for expired HTLCs and closes being swept, change from fundings, etc. But since all other outputs are being extracted from blocks we process, there is no stub for incoming onchain transactions that did not originate from us.
Checking for completeness and correctness is hard, especially if you don't have a ground truth to check against. What we could do however is verify that there is an improvement:
- Copy the node to a VM that has a fully synced bitcoind but is otherwise not connected to the network (since we're going to start a copy of the node we could otherwise cheat inadvertently, so make sure this CLN copy cannot send transactions or connect to it's peers)
- Run this isolated copy against the offline bitcoind, with
--rescan=200000(replace 200000 with whatever range you want to rescan) - Take a snapshot of the
outputstable of the offline and online node and compare them
This method can tell us whether there has been an improvement, but not whether there is further issue. Also I chose to copy the node because rescanning may take the node some time, during which it can't really route or be used otherwise. Keeping an online copy for those operations, while using an offline close to verify avoids being offline for a prolonged period.
We can then merge the results into the online node by filling in the missing outputs in the outphts table and then call dev-rescan-outputs to sync their status with the chain.