This PR is an experiment about parallelizing transaction verification in order to decrease it's running time. It should not be merged as is, but it's ready to be transformed into production PRs.

Specifically, it improves only script verification by verifying the scripts from all inputs of a single transaction in parallel. These implementations were compared:

CURRENT: The current implementation in master.
PYTHON_MULTITHREAD: A new implementation parallelizing scripts verification using multithreading in Python.
PYTHON_MULTIPROCESS: A new implementation parallelizing scripts verification using multiprocessing in Python.
RUST_MULTITHREAD: A new implementation parallelizing scripts verification by implementing a P2PKH executor in rust, also with multithreading.

Note that I ran the benchmarks considering the worst case, that is, a transaction with 255 inputs. Also, for the Rust implementation, only P2PKH scripts are currently supported. Other types, including non-standard scripts, fallback to the current Python implementation (and therefore are not considered in any benchmarks).

Initially I ran the benchmarks for 1,000 transactions (each with 255 inputs), for all implementations. Using the Python multithreaded implementation, which is a quick-win, yields a ~2.5x reduction in processing time when compared to the baseline, current implementation. Then, the Rust implementation yields a ~12x reduction in processing time, or almost 5 times faster then the multithreaded Python. The Python with multiprocessing implementation is completely unsuitable for this application, yielding processing times 4x slower than the current implementation, or ~50x slower than the Rust implementation. Each benchmark ran 2 times.

Then, I ran it for 10,000 transactions, but excluding the Python multiprocess implementation as it's already discarded. Each benchmark ran once.

Specific results are in the tables below. The code for reproducing these results are contained in this PR, except for the Rust implementation.

Verifying 1,000 transactions with 255 inputs each

Command	Mean [s]	Min [s]	Max [s]	Relative
`CURRENT`	148.670 ± 1.092	147.898	149.443	11.87 ± 0.11
`RUST_MULTITHREAD`	12.521 ± 0.078	12.467	12.576	1.00
`PYTHON_MULTITHREAD`	59.309 ± 0.735	58.790	59.828	4.74 ± 0.07
`PYTHON_MULTIPROCESS`	594.714 ± 25.360	576.781	612.646	47.50 ± 2.05

Verifying 10,000 transactions with 255 inputs each

Command	Mean [s]	Min [s]	Max [s]	Relative
`RUST_MULTITHREAD`	121.682	121.682	121.682	1.00
`PYTHON_MULTITHREAD`	558.422	558.422	558.422	4.59
`CURRENT`	1454.168	1454.168	1454.168	11.95

Sep 02 '24 21:09 glevco

Bencher

Report	Mon, September 2, 2024 at 21:31:26 UTC
Project	hathor-core
Branch	glevco/verification-experiments
Testbed	ubuntu-22.04

Click to view all benchmark results

Benchmark	Latency	Latency Results nanoseconds (ns) \| (Δ%)	Latency Lower Boundary nanoseconds (ns) \| (%)	Latency Upper Boundary nanoseconds (ns) \| (%)
sync-v2 (up to 20000 blocks)	✅ (view plot)	101,435,809,019.00 (-0.80%)	92,030,274,231.18 (90.73%)	112,481,446,282.55 (90.18%)

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

Sep 02 '24 21:09 github-actions[bot]

What about tests with 1 or 2 inputs? Since 255 is the worst case for verification it's also probably the best case for optimization gains, but it's not the most common case by far. My reasoning is this:

If tests with few inputs show no significant slowdown, then it's a no-brainer and this is an absolute improvement (I don't expect this)
If tests with few inputs are significantly slower because of any added overhead, then we might want to consider keeping both methods and look for the N for which N inputs seem to start benefiting (this might not be the case, but I'm not sure)
If tests with few inputs are only slightly slower, then I think it's probably not worth any additional complexity, I'm not sure what "slightly slower" would be, but I guess we can find out (this is more what I would expect, but I think it's worth finding out)

Sep 18 '24 21:09 jansegre

perf(verification): improve tx verification speed [POC]

Verifying 1,000 transactions with 255 inputs each

Verifying 10,000 transactions with 255 inputs each

Bencher