bitcoin
bitcoin copied to clipboard
assumeutxo: nTx and nChainTx violations in CheckBlockIndex
When disabling the "test-only" assumptions in CheckBlockIndex, the check fails. This is problematic, because test-only code should not affect the behavior of the program in production.
Current diff:
diff --git a/src/validation.cpp b/src/validation.cpp
index 8c583c586c..00d7ee3736 100644
--- a/src/validation.cpp
+++ b/src/validation.cpp
@@ -4866,9 +4866,9 @@ void ChainstateManager::CheckBlockIndex()
unsigned int prev_chain_tx = pindex->pprev ? pindex->pprev->nChainTx : 0;
assert((pindex->nChainTx == pindex->nTx + prev_chain_tx)
// For testing, allow transaction counts to be completely unset.
- || (pindex->nChainTx == 0 && pindex->nTx == 0)
+ //|| (pindex->nChainTx == 0 && pindex->nTx == 0)
// For testing, allow this nChainTx to be unset if previous is also unset.
- || (pindex->nChainTx == 0 && prev_chain_tx == 0 && pindex->pprev)
+ //|| (pindex->nChainTx == 0 && prev_chain_tx == 0 && pindex->pprev)
// Transaction counts prior to snapshot are unknown.
|| pindex->IsAssumedValid());
Steps to reproduce the crash:
$ ./src/qt/bitcoin-qt -datadir=/tmp -regtest
> generatetodescriptor 1 raw(ff)
Related to https://github.com/bitcoin/bitcoin/pull/28791 and https://github.com/bitcoin/bitcoin/pull/28562#discussion_r1350470707
$ ./src/qt/bitcoin-qt -datadir=/tmp -regtest
Do we not consider regtest "testing"?
ISTM that only "test mode" (regtest) will fail the assertion with the lines removed.
Do we not consider regtest "testing"?
The comment is vague, but it's really referring to unit tests.
When I suggested adding the two conditions allowing pindex->nChainTx == 0
"for testing" in https://github.com/bitcoin/bitcoin/pull/27596#discussion_r1323226418, the reason was to avoid crashes in unit tests, because some unit tests skipped calling ReceivedBlockTransactions
and therefore skipped setting a nChainTx
value:
https://github.com/bitcoin/bitcoin/blob/651fb034d85eb5db561bfd24b74f7271417defa5/src/validation.cpp#L3613
But outside unit tests the condition pindex->nChainTx == pindex->nTx + prev_chain_tx
should be true and the assert should succeed.
So there is unexpected behavior here if bitcoin-qt is crashing without these conditions, and maybe there is a real bug. Also this code could probably be improved by updating unit tests to set nChainTx correctly, so the two special case conditions in the assert allowing pindex->nChainTx == 0
could be dropped.
So there is unexpected behavior here if bitcoin-qt is crashing without these conditions
This is not specific to bitcoin-qt
, or assumeutxo. For example, It also happens with bitcoin-cli -regtest -generate 1
(after creating an empty wallet) on an empty datadir.
I think the comment // For testing, allow transaction counts to be completely unset.
is wrong. CheckBlockIndex
traverses the entire block index tree, no matter if we have the block on disk or not. If we don't have it, ReceivedBlockTransactions
hasn't been called, and both nTx
and nChainTx
are 0.
Therefore, the check assert((pindex->nChainTx == pindex->nTx + prev_chain_tx)
cannot be applied to headers for which we never processed the block data, and this has nothing to do with AssumeUtxo or Testing.
I'll open a PR with a fix tomorrow unless someone disagrees.
Oops, sorry I missed that context so my explanation doesn't really describe what the checks are doing (although it does describe what they were intending to do). If the PR is just going to change the comments and delete the words "For testing," that sounds good.
Other notes on improving this assert:
- The final pindex->IsAssumedValid() check added in #28791 seems a little overbroad. I think the condition
(pindex->nChainTx == pindex->nTx + prev_chain_tx)
should actually be true for all assumed-valid blocks except the first one and the last one, so that check could be tightened up - It might possible to add a test covering the assert failure that pindex->IsAssumedValid() from #28791 fixes, since #28791 didn't seem to include a test
$ ./src/qt/bitcoin-qt -datadir=/tmp -regtest
Do we not consider regtest "testing"?
Regtest should have the same properties as main
for the purposes here. It should be possible to reproduce on main as well, if you want to do the POW.
Ah I see now, thanks. The repro steps threw me off there. I did now also verify that it is hit on (a custom) signet (with -checkblocks
set).
- The final pindex->IsAssumedValid() check added in #28791 seems a little overbroad. I think the condition
(pindex->nChainTx == pindex->nTx + prev_chain_tx)
should actually be true for all assumed-valid blocks except the first one and the last one, so that check could be tightened up
After trying this out, I think that this is not the case. The BLOCK_ASSUMED_VALID
status is only removed when the block is connected to the chain and raised to BLOCK_VALID_SCRIPTS
. However, nTx
and nChainTx
are updated from their fake to their actual values in AcceptBlock
/ ReceivedBlockTransactions
, which happens before that. So during the period where we receive blocks for the background chain in some, possibly random, order, there will be a mix of blocks with updated nTx
and fake nTx
ones, so that the condition could fail everywhere in the range of assumed-valid blocks.
For that, we'd need a chain where some blocks (at least the snapshot block, possibly also others) have additional transactions.
It seems not very clean to update the regtest chainparams everytime we want to test a snapshot with a slightly different chain.
Not sure if this has been discussed before, but maybe it could make sense to add a RPC that allows us to register an entry to m_assumeutxo_data
dynamically (just for regtest of course).
See #29299 for a fix (only changing the doc and moving the check into a proper section).
So during the period where we receive blocks for the background chain in some, possibly random, order, there will be a mix of blocks with updated
nTx
and fakenTx
ones, so that the condition could fail everywhere in the range of assumed-valid blocks.
Thanks for figuring this out and trying this. ~Trying to put this all together it seems like these are the cases:~
~1. Case where IsValid()
is true. Then nChainTx
must be prev nChainTx
+ nTx
.~
~2. Case where nTx
is 0. Then nChainTx
must be 0 because the block doesn't have transactions yet.~
~3. Case where prev nChainTx
is 0. Then nChainTx
must be 0 because some ancestor block doesn't have transactions yet.~
~4. Case where IsAssumedValid()
is true. Probably the only meaningful thing to check in this case is that nChainTx
is greater than prev nChainTx
. This should always be true except if prev IsValid()
is true and this is not the snapshot block. In that case, there will be a discontinuity and nChainTx
will decrease because it is a bogus value following a real value.~
EDIT: There are a number of mistakes in the suggestion above. It would be good to check for increasing nChainTx values, but the breakdown above is not correct. The following checks do seem to work, though:
if (!pindex->pprev) {
// If there's no previous block, nChainTx should be set to the number
// of transactions in this block.
assert(pindex->nChainTx == pindex->nTx);
} else if (pindex->IsAssumedValid()) {
// If block is assumed valid, nChainTx could be faked, but should at
// least be increasing between blocks. The only exception is
// assumed-valid blocks that are not the snapshot block and don't
// have transactions, directly following blocks that do have
// transactions. The faked nChainTx values in these will probably be
// less than the non-faked values in the previous blocks.
assert(pindex->nChainTx > pindex->pprev->nChainTx || (!pindex->IsValid() && pindex->pprev->IsValid() && pindex != GetSnapshotBaseBlock()));
} else if (pindex->nTx == 0 || pindex->pprev->nChainTx == 0) {
// If block is missing transactions, or any parent block is,
// nChainTx should be unset.
assert(pindex->nChainTx == 0);
} else {
// For normal blocks, nChainTx should be set to the sum of
// transactions in this and previous blocks.
assert(pindex->nChainTx == pindex->pprev->nChainTx + pindex->nTx);
}
The following checks do seem to work, though:
Pull requests welcome
Let me know if I should reopen this issue?
Let me know if I should reopen this issue?
It's good to close. Extending the assert with https://github.com/bitcoin/bitcoin/issues/29261#issuecomment-1908513362 would only be a marginal improvement. I think a better long term fix would be to get rid of fake nTx
and nChainTx
values as described in https://github.com/bitcoin/bitcoin/issues/29328#issuecomment-1914922108 "In the long run...".
// If block is assumed valid, nChainTx could be faked, but should at least be increasing between blocks.
I'm not completely convinced this is always true - I might well be wrong (didn't test it), but imagine the following scenario during the background sync:
In the beginning, all relevant blocks are assumed valid and have fake values for nTx
and nChainTx
.
Then we receive an out-of order block of height h
(that doesn't connect yet) -> we set nTx
and nChainTx
.
-> Everything is ok, even though nChainTx may decrease, the checks succeeds because the block of height h+1
may have a small nChainTx, but IsValid()
is false.
But next, imagine we receive the block at height h-1
(that doesn't connect to the tip either) with a large tx count. We'd update nChainTx
for that block, but I don't see how we would then also update the block at height h
in ReceivedBlockTransactions
(because it won't be in m_blocks_unlinked
if it has data). As a result the assert fail because we now have a decreasing nChainTx
between two valid blocks.
re: https://github.com/bitcoin/bitcoin/issues/29261#issuecomment-1917606474
Then we receive an out-of order block of height h (that doesn't connect yet) -> we set nTx and nChainTx.
I was going to reply that in this case we should only set nTx
not nChainTx
, but this does not appear to be true. The code that is trying to check for this case:
https://github.com/bitcoin/bitcoin/blob/cad2df24b396be403f13f372ec48ea14a90d7f06/src/validation.cpp#L3604-L3605
just assumes that if the previous block's nChainTx is nonzero then it is valid. So it seems you are right and in this case nChainTx will be assigned a nonsense value, and keep that value until the node is restarted and nChainTx is recomputed?
So it seems like there is a larger bug here, and the comment there equating the condition pindexNew->pprev->HaveNumChainTxs()
with "all parents are BLOCK_VALID_TRANSACTIONS" is not correct in the presence of fake values.
So it seems like there is a larger bug here
Yes, that could be the case. Since I don't see how this can correct without restarting after the blocks are connected to the chain and no longer assumed valid, I wonder if that could make the existing check https://github.com/bitcoin/bitcoin/blob/cad2df24b396be403f13f372ec48ea14a90d7f06/src/validation.cpp#L4949 fail.
Since it is very slow to sync with -checkblockindex=1
even on signet (though commit https://github.com/bitcoin/bitcoin/pull/28339/commits/87d6155b91797d795433c9c8bca5da20c74f810c from #28339 could help with that) there is a good chance that no one has ever tried it. Alternatively, a functional test could be written.
test
Good idea.
On top of https://github.com/bitcoin/bitcoin/pull/29354 , the following diff should crash the node:
diff --git a/test/functional/feature_assumeutxo.py b/test/functional/feature_assumeutxo.py
index 528680f2ca..e9d74ea132 100755
--- a/test/functional/feature_assumeutxo.py
+++ b/test/functional/feature_assumeutxo.py
@@ -165,6 +165,14 @@ class AssumeutxoTest(BitcoinTestFramework):
self.mini_wallet.send_self_transfer(from_node=n0)
self.generate(n0, nblocks=1, sync_fun=self.no_op)
newblock = n0.getblock(n0.getbestblockhash(), 0)
+ if i == 4:
+ # Create a stale block
+ temp_invalid = n0.getbestblockhash()
+ n0.invalidateblock(temp_invalid)
+ stale_hash = self.generateblock(n0, output="raw(aaaa)", transactions=[], sync_fun=self.no_op)["hash"]
+ n0.invalidateblock(stale_hash)
+ n0.reconsiderblock(temp_invalid)
+ stale_block = n0.getblock(stale_hash, 0)
# make n1 aware of the new header, but don't give it the block.
n1.submitheader(newblock)
@@ -215,6 +223,12 @@ class AssumeutxoTest(BitcoinTestFramework):
assert_equal(n1.getblockchaininfo()["blocks"], SNAPSHOT_BASE_HEIGHT)
+ self.log.info("Sumbit a stale block")
+ n1.submitblock(stale_block)
+ n1.getchaintips()
+ n1.getblock(stale_hash)
+ #assert False
+
self.log.info("Submit a spending transaction for a snapshot chainstate coin to the mempool")
# spend the coinbase output of the first block that is not available on node1
spend_coin_blockhash = n1.getblockhash(START_HEIGHT + 1)
re: https://github.com/bitcoin/bitcoin/issues/29261#issuecomment-1918947945
On top of #29354 , the following diff should crash the node:
This is a great test, thank you! I added it to #29370 and it uncovered a new bug that PR introduced to the CheckBlockIndex code (forgetting to reset pindexFirstNeverProcessed after moving upwards from the snapshot node).
re: https://github.com/bitcoin/bitcoin/issues/29261#issuecomment-1917639308
So it seems like there is a larger bug here
Looking into this more, I think the bug where pprev->HaveNumChainTxs
can return true on this line even when "all parents are BLOCK_VALID_TRANSACTIONS" is false would be pretty hard to trigger and the consequences would not be too bad.
https://github.com/bitcoin/bitcoin/blob/cad2df24b396be403f13f372ec48ea14a90d7f06/src/validation.cpp#L3604-L3605
The main consequence would be blocks winding up with nonsense nChainTx values based on sums with previous blocks fake values, which could trigger the CheckBlockIndex failure in marco's test https://github.com/bitcoin/bitcoin/issues/29261#issuecomment-1918947945.
But other than that, it shouldn't matter because fork blocks before the snapshot block and ahead of the background chain tip will ignored by TryAddBlockIndexCandidate, so it is harmless if that is incorrectly called on them. And non-fork blocks before the snapshot block and ahead of the background chain tip will just get added to setBlockIndexCandidates
instead of m_blocks_unlinked
, which is fine because FindMostWorkChain
should move them back to m_blocks_unlinked
. As long as this happens the blocks should eventually be validated and the nChainTx
values should get corrected.
So the "or all parents are BLOCK_VALID_TRANSACTIONS" comment is incorrect, but it shouldn't matter too much. #29370 should make this all more straightforward though.