perf(verification): improve tx verification speed [POC]
This PR is an experiment about parallelizing transaction verification in order to decrease it's running time. It should not be merged as is, but it's ready to be transformed into production PRs.
Specifically, it improves only script verification by verifying the scripts from all inputs of a single transaction in parallel. These implementations were compared:
CURRENT: The current implementation inmaster.PYTHON_MULTITHREAD: A new implementation parallelizing scripts verification using multithreading in Python.PYTHON_MULTIPROCESS: A new implementation parallelizing scripts verification using multiprocessing in Python.RUST_MULTITHREAD: A new implementation parallelizing scripts verification by implementing a P2PKH executor in rust, also with multithreading.
Note that I ran the benchmarks considering the worst case, that is, a transaction with 255 inputs. Also, for the Rust implementation, only P2PKH scripts are currently supported. Other types, including non-standard scripts, fallback to the current Python implementation (and therefore are not considered in any benchmarks).
Initially I ran the benchmarks for 1,000 transactions (each with 255 inputs), for all implementations. Using the Python multithreaded implementation, which is a quick-win, yields a ~2.5x reduction in processing time when compared to the baseline, current implementation. Then, the Rust implementation yields a ~12x reduction in processing time, or almost 5 times faster then the multithreaded Python. The Python with multiprocessing implementation is completely unsuitable for this application, yielding processing times 4x slower than the current implementation, or ~50x slower than the Rust implementation. Each benchmark ran 2 times.
Then, I ran it for 10,000 transactions, but excluding the Python multiprocess implementation as it's already discarded. Each benchmark ran once.
Specific results are in the tables below. The code for reproducing these results are contained in this PR, except for the Rust implementation.
Verifying 1,000 transactions with 255 inputs each
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
CURRENT |
148.670 ± 1.092 | 147.898 | 149.443 | 11.87 ± 0.11 |
RUST_MULTITHREAD |
12.521 ± 0.078 | 12.467 | 12.576 | 1.00 |
PYTHON_MULTITHREAD |
59.309 ± 0.735 | 58.790 | 59.828 | 4.74 ± 0.07 |
PYTHON_MULTIPROCESS |
594.714 ± 25.360 | 576.781 | 612.646 | 47.50 ± 2.05 |
Verifying 10,000 transactions with 255 inputs each
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
RUST_MULTITHREAD |
121.682 | 121.682 | 121.682 | 1.00 |
PYTHON_MULTITHREAD |
558.422 | 558.422 | 558.422 | 4.59 |
CURRENT |
1454.168 | 1454.168 | 1454.168 | 11.95 |
Bencher
| Report | Mon, September 2, 2024 at 21:31:26 UTC |
| Project | hathor-core |
| Branch | glevco/verification-experiments |
| Testbed | ubuntu-22.04 |
Click to view all benchmark results
| Benchmark | Latency | Latency Results nanoseconds (ns) | (Δ%) | Latency Lower Boundary nanoseconds (ns) | (%) | Latency Upper Boundary nanoseconds (ns) | (%) |
|---|---|---|---|---|
| sync-v2 (up to 20000 blocks) | ✅ (view plot) | 101,435,809,019.00 (-0.80%) | 92,030,274,231.18 (90.73%) | 112,481,446,282.55 (90.18%) |
Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help
What about tests with 1 or 2 inputs? Since 255 is the worst case for verification it's also probably the best case for optimization gains, but it's not the most common case by far. My reasoning is this:
- If tests with few inputs show no significant slowdown, then it's a no-brainer and this is an absolute improvement (I don't expect this)
- If tests with few inputs are significantly slower because of any added overhead, then we might want to consider keeping both methods and look for the N for which N inputs seem to start benefiting (this might not be the case, but I'm not sure)
- If tests with few inputs are only slightly slower, then I think it's probably not worth any additional complexity, I'm not sure what "slightly slower" would be, but I guess we can find out (this is more what I would expect, but I think it's worth finding out)