bitcoin [IBD] specialize CheckBlock's input & coinbase checks

trafficstars

This change is part of [IBD] - Tracking PR for speeding up Initial Block Download

Summary

CheckBlock's latency is critical for efficiently validating correct inputs during transaction validation, including mempool acceptance and new block creation.

This PR improves performance and maintainability by introducing the following changes:

Simplified checks for the most common cases (1 or 2 inputs - 70-90% of transactions have a single input).
Optimized the general case by replacing std::set with sorted std::vector for improved locality.
Simplified Null prevout checks from linear to constant time.

Context, concerns

As noted by @sipa in https://github.com/bitcoin/bitcoin/pull/31682#pullrequestreview-2570273162, this is consensus-critical code (especially since we've had an actual inflation bug related to duplicate checking). However, the changes here are very confined to a single function, easily reviewed, and I have a general preference for avoiding std::set when sorted std::vector works too as well.

The goal of this change is to make block validation faster via a low-risk alternative.

Measurements

C++ compiler .......................... AppleClang 16.0.0.16000026

Before:

ns/block	block/s	err%	total	benchmark
372,743.63	2,682.81	1.1%	10.99	`CheckBlockBench`

ns/op	op/s	err%	total	benchmark
3,304,694.54	302.60	0.5%	11.05	`DuplicateInputs`
9,585.10	104,328.55	0.1%	11.03	`ProcessTransactionBench`

After:

ns/block	block/s	err%	total	benchmark
179,971.00	5,556.45	0.3%	11.02	`CheckBlockBench`

ns/op	op/s	err%	total	benchmark
963,177.98	1,038.23	1.7%	10.92	`DuplicateInputs`
9,410.90	106,259.75	0.3%	11.01	`ProcessTransactionBench`

2.07x faster CheckBlockBench
3.4x faster DuplicateInputs
1.8% faster ProcessTransactionBench

C++ compiler .......................... GNU 13.3.0

Before:

ns/block	block/s	err%	ins/block	cyc/block	IPC	bra/block	miss%	total	benchmark
1,096,261.84	912.19	0.1%	7,963,390.88	3,487,375.26	2.283	1,266,941.00	1.8%	11.03	`CheckBlockBench`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
8,366,309.48	119.53	0.0%	23,865,177.67	26,620,160.23	0.897	5,972,887.41	4.0%	10.78	`DuplicateInputs`
56,199.57	17,793.73	0.1%	229,263.01	178,766.31	1.282	15,509.97	0.5%	10.91	`ProcessTransactionBench`

After:

ns/block	block/s	err%	ins/block	cyc/block	IPC	bra/block	miss%	total	benchmark
834,855.94	1,197.81	0.0%	6,518,548.86	2,656,039.78	2.454	919,160.84	1.5%	10.78	`CheckBlockBench`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
4,261,492.75	234.66	0.0%	17,379,823.40	13,559,793.33	1.282	4,265,714.28	3.4%	11.00	`DuplicateInputs`
55,819.53	17,914.88	0.1%	227,828.15	177,520.09	1.283	15,184.36	0.4%	10.91	`ProcessTransactionBench`

31% faster CheckBlockBench
2x faster DuplicateInputs
0.5% faster ProcessTransactionBench

While the point of the change isn't necessarily to speed up IBD, but you can see the measurements at https://github.com/bitcoin/bitcoin/pull/31682#issuecomment-2606765689

Related PRs:

https://github.com/bitcoin/bitcoin/pull/14397 - very similar solution, I only found it after pushing the PR (was closed for inactivity)
https://github.com/bitcoin/bitcoin/pull/14837 - a faster, but a lot more complex alternative (was closed for complexity and inactivity).

Jan 18 '25 15:01 l0rinc

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31682.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept NACK	1440000bytes
Concept ACK	sipa
Approach ACK	Raimo33

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#33629 (Cluster mempool by sdaftuar)
#32729 (test,refactor: extract script template helpers & widen sigop count coverage by l0rinc)
#32554 (bench: replace embedded raw block with configurable block generator by l0rinc)
#31868 ([IBD] specialize block serialization by l0rinc)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Jan 18 '25 15:01 DrahtBot

🚧 At least one of the CI tasks failed. _{Debug: https://github.com/bitcoin/bitcoin/runs/35818754067}

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

Jan 18 '25 15:01 DrahtBot

How much of an improvement does this translate into for a user? The bar should be pretty high for touching such critical code.

Jan 19 '25 01:01 darosior

The bar should be pretty high for touching such critical code.

Absolutely, every change should have an extremely high bar here. For context, compared to the changes described in https://bitcoincore.org/en/2018/09/20/notice, this update doesn’t involve caching or reusing previously calculated values elsewhere. Instead, it operates entirely within the method, maintaining a black-box approach that ensures the exact same behavior. To validate this, I’ve added an extensive suite of tests that thoroughly cover these cases.

How much of an improvement does this translate into for a user?

The primary goal of this change is to improve block validation performance while maintaining a low risk profile. The PR description includes detailed measurements of the improvements. If there are additional benchmarks or scenarios you’d like me to measure, please let me know, and I’ll be happy to provide them.

Jan 19 '25 18:01 l0rinc

it is called whenever a tx is validated (so also during mempool acceptance)

Updated the description, thanks.

how much this improves transaction validation as a whole

I can only optimize small parts of these, step-by-step. This was the next part I found that could be cheaply sped up (having preexisting tests, benchmarks - signaling that others were also interested in it).

I have created a chainman.ProcessTransaction benchmark locally (will commit separately, not strictly related to this PR) to check the overall effect, but as you've stated, this is a small part of the whole process, so I regularly got roughly ~1% speedup only. Not huge, but we have to decide if it's proportional to the risk or not.

ProcessTransaction results

git checkout bf090135d01076aa6d4fbfe7a185682dd6f9f489 >/dev/null 2>&1 \
&& cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 \
&& cmake --build build -j$(nproc) >/dev/null 2>&1 \
&& build/src/bench/bench_bitcoin -filter='ProcessTransactionBench' -min-time=10000 \
&& git checkout f25b91cd5a50e1035bbdb884ab11858702fadf33 >/dev/null 2>&1 \
&& cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 \
&& cmake --build build -j$(nproc) >/dev/null 2>&1 \
&& build/src/bench/bench_bitcoin -filter='ProcessTransactionBench' -min-time=10000

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
56,570.90	17,676.93	0.1%	229,118.71	179,945.97	1.273	15,499.48	0.5%	11.03	`ProcessTransactionBench`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
56,152.78	17,808.56	0.0%	228,009.18	178,596.02	1.277	15,224.76	0.4%	10.92	`ProcessTransactionBench`

Similarly, I ran a few reindex-chainstates (as proxies for a stable IBD) and - as posted in the description -, got a similar ~1% speedup (2 runs per bench, 19923 & 19933 seconds before, and 19681 and 19737 seconds after). Not earth-shattering globally, but I think the risk (given all the tests and benchmarks) is relatively low as well.

Jan 20 '25 18:01 l0rinc

I regularly got roughly ~1% speedup only. Not huge, but we have to decide if it's proportional to the risk or not.

got a similar ~1% speedup (2 runs per bench, 19923 & 19933 seconds before, and 19681 and 19737 seconds after).

NACK

Jan 20 '25 20:01 1440000bytes

@1440000bytes, this behavior isn't helpful - you're just reiterating what I've already explained, without suggesting workable solutions.

Anyway, I've created an alternative PR that focuses solely on the tests and benchmarks, excluding the controversial optimizations. It validates the affected code and removes microbenchmarks that overemphasize this segment's significance, replacing it with a higher-level one, as suggested by @mzumsande.

Jan 21 '25 14:01 l0rinc

@1440000bytes, this behavior isn't helpful - you're just reiterating what I've already explained, without suggesting workable solutions.

I don't think optimization is worth changing consensus code.

Jan 21 '25 14:01 1440000bytes

~I pushed a new version that is based on https://github.com/bitcoin/bitcoin/pull/31699 - to separate the testing/benching from the consensus change. This will remain in draft to gather comments.~

I've also retriggered an IBD (simulated via multiple -reindex-chainstate runs for stability). Here I've compared the performance of this PR after applying my other IBD optimizations (https://github.com/bitcoin/bitcoin/pull/31144 and https://github.com/bitcoin/bitcoin/pull/31645) to measure the future potential of this change. I ran it until 800k block this time to avoid the validation overhead.

Benchmark results

hyperfine --runs 2 --parameter-list COMMIT dcc3fe7f9c4b68a4b4622ef958a3262cd8a4b08a,053ccbefa3563e776932bb847aae9c302d07de61 --prepare 'rm -rfd /mnt/my_storage/BitcoinData/debug.log build/ && git checkout {COMMIT} && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' --cleanup 'mv /mnt/my_storage/BitcoinData/debug.log /mnt/my_storage/logs/debug-{COMMIT}-$(date +%s.log)' 'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0'
Benchmark 1: COMMIT=dcc3fe7f9c4b68a4b4622ef958a3262cd8a4b08a ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0
  Time (mean ± σ):     12537.220 s ±  7.591 s    [User: 13362.903 s, System: 523.533 s]
  Range (min … max):   12531.852 s … 12542.588 s    2 runs
 
Benchmark 2: COMMIT=053ccbefa3563e776932bb847aae9c302d07de61 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0
  Time (mean ± σ):     12251.387 s ± 52.389 s    [User: 13169.814 s, System: 543.757 s]
  Range (min … max):   12214.343 s … 12288.432 s    2 runs
 
Summary
  COMMIT=053ccbefa3563e776932bb847aae9c302d07de61 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0 ran
    1.02 ± 0.00 times faster than COMMIT=dcc3fe7f9c4b68a4b4622ef958a3262cd8a4b08a ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0

This reveals a ~2.5% speedup (taking the best runs out of 2).

Jan 22 '25 09:01 l0rinc

NACK https://github.com/bitcoin/bitcoin/commit/0a40b8c8a09344c1ef9167be2f357da7a551d65d

Reason: https://github.com/bitcoin/bitcoin/pull/31682#issuecomment-2604902068

Jan 22 '25 20:01 1440000bytes

I don't like your behavior to keep pushing for minimal changes to affect consensus code

Jan 22 '25 20:01 1440000bytes

@1440000bytes keep your comments on topic. the topic is technical. comments about people are not relevant.

Jan 22 '25 20:01 pinheadmz

@1440000bytes keep your comments on topic. the topic is technical. comments about people are not relevant.

It is on topic and response to https://github.com/bitcoin/bitcoin/pull/31682#issuecomment-2604837850

I have reviewed technically before others.

Jan 22 '25 20:01 1440000bytes

Everyone needs to cool down on this pull request. 24 hour bans are next for users that mention people and not code. After that its 7 day bans.

Jan 22 '25 20:01 pinheadmz

Merge this pull request if my comments are so off topic.

Jan 22 '25 21:01 1440000bytes

Interesting how std::vector is almost always the best performing container, even when uniqueness or sorting is required.

Jan 23 '25 11:01 jmoik

std::vector is probably less likely to have bugs that std::set, so there's an improvement in that regard as well.

I feel like we shouldn't need to fully sort, though. Perhaps a custom check would improve performance more significantly? (OTOH, the "normal flow" scenario might end up being a full sort anyway, unsure)

Jan 23 '25 15:01 luke-jr

(Side-note: In theory std::flat_set constructor (4) (https://en.cppreference.com/w/cpp/container/flat_set/flat_set) could be used as a drop-in replacement to do the sort+uniq, but this requires C++23 and isn't even implemented in any released standard lib and would only turn two calls into one.)

Jan 24 '25 10:01 maflcko

I feel like we shouldn't need to fully sort, though

We likely wouldn't need that just for duplicate checking, but this seems "good enough", since it's a built-in function - especially now that I have integrated it with null checks as well.

The code now separates (in many tiny commits to ease review) validations by input sizes:

1 input - it's either coinbase, in which case we don't need duplicate checks and don't need to check that none of the inputs are null, or if it's not, we're done with validation. Given that this is the most common case currently, seemed like an important case to tread separately.
2 inputs - we check that the two are not equal and that they're not null
otherwise we sort the prevouts, check duplicates via built-in (possibly SIMD optimized) method and check for nulls at the head of the sorted values in constant time (we could also check NULL during sortedPrevouts collection but that would change the validation order for duplicates & nulls).

I've also extended the tests to cover random null check validations.

Jan 25 '25 14:01 l0rinc

I experimented a bit more and it looks like I got another 17% speed improvement (on block413567 on my linux machine) over your changes @l0rinc , by using a faster comparison operator and doing direct comparisons for relatively small input sizes. Can I create a new pull request for that?

Jan 30 '25 14:01 jmoik

by using a faster comparison operator and doing direct comparisons for relatively small input sizes.

Quadratic comparison would likely be faster than sorting for very small input sizes (e.g. <10, I guess), but it would complicate the code and testing considerably for a case that is likely not representative. I also thought of that but I want the code to be faster while not being more difficult to comprehend. But feel free to challenge my bias by pushing a different PR with those changes and let the best approach win! You can cherry-pick it on top of these changes if you want.

Jan 30 '25 15:01 l0rinc

Rebased to fix conflicts with https://github.com/bitcoin/bitcoin/pull/33116

Aug 13 '25 23:08 l0rinc

Concept ACK.

will review soon. I'm a fan of the simpler std::vector approach.

Sep 08 '25 04:09 Raimo33

The bar should be pretty high for touching such critical code.

I agree but I argue the changes proposed are small enough that this should be seriously considered for merge. the diff makes it hard to grasp, but few actually changed, and for the best. I'm debated on cae4bb877f9ef1350825ed42a9b97090d5920d39, but for the rest, Concept ACK, Approach ACK, ACK d28302fa2c09c2eac3daad7116dc9812fe01928b.

I've ran the tests and double checked the logic.

taskset -c 1 ./bench_bitcoin --filter="(ProcessTransactionBench)" --min-time=10000

Before:

ns/op	op/s	err%	total	benchmark
52,941.54	18,888.76	0.1%	10.79	`ProcessTransactionBench`

After:

ns/op	op/s	err%	total	benchmark
52,110.83	19,189.87	0.1%	10.78	`ProcessTransactionBench`

Without cae4bb877f9ef1350825ed42a9b97090d5920d39:

- else if (tx.vin.size() == 2) {
-       if (tx.vin[0].prevout == tx.vin[1].prevout) {
-           return state.Invalid(TxValidationResult::TX_CONSENSUS, "bad-txns-inputs-duplicate");
-       }
-       if (tx.vin[0].prevout.IsNull() || tx.vin[1].prevout.IsNull()) {
-           return state.Invalid(TxValidationResult::TX_CONSENSUS, "bad-txns-prevout-null");
-       }
-}

ns/op	op/s	err%	total	benchmark
52,778.63	18,947.06	0.1%	10.77	`ProcessTransactionBench`

+1.6% faster only +0.3% faster without cae4bb877f9ef1350825ed42a9b97090d5920d39

Looks like the gain comes from handling txs with 1-2 inputs separately.

Sep 08 '25 06:09 Raimo33

bitcoin bitcoin copied to clipboard

[IBD] specialize CheckBlock's input & coinbase checks

Summary

Context, concerns

Measurements

Code Coverage & Benchmarks

Reviews

Conflicts

bitcoin
bitcoin copied to clipboard