hathor-core icon indicating copy to clipboard operation
hathor-core copied to clipboard

perf(verification): improve tx verification speed [POC]

Open glevco opened this issue 1 year ago • 2 comments

This PR is an experiment about parallelizing transaction verification in order to decrease it's running time. It should not be merged as is, but it's ready to be transformed into production PRs.

Specifically, it improves only script verification by verifying the scripts from all inputs of a single transaction in parallel. These implementations were compared:

  • CURRENT: The current implementation in master.
  • PYTHON_MULTITHREAD: A new implementation parallelizing scripts verification using multithreading in Python.
  • PYTHON_MULTIPROCESS: A new implementation parallelizing scripts verification using multiprocessing in Python.
  • RUST_MULTITHREAD: A new implementation parallelizing scripts verification by implementing a P2PKH executor in rust, also with multithreading.

Note that I ran the benchmarks considering the worst case, that is, a transaction with 255 inputs. Also, for the Rust implementation, only P2PKH scripts are currently supported. Other types, including non-standard scripts, fallback to the current Python implementation (and therefore are not considered in any benchmarks).

Initially I ran the benchmarks for 1,000 transactions (each with 255 inputs), for all implementations. Using the Python multithreaded implementation, which is a quick-win, yields a ~2.5x reduction in processing time when compared to the baseline, current implementation. Then, the Rust implementation yields a ~12x reduction in processing time, or almost 5 times faster then the multithreaded Python. The Python with multiprocessing implementation is completely unsuitable for this application, yielding processing times 4x slower than the current implementation, or ~50x slower than the Rust implementation. Each benchmark ran 2 times.

Then, I ran it for 10,000 transactions, but excluding the Python multiprocess implementation as it's already discarded. Each benchmark ran once.

Specific results are in the tables below. The code for reproducing these results are contained in this PR, except for the Rust implementation.

Verifying 1,000 transactions with 255 inputs each

Command Mean [s] Min [s] Max [s] Relative
CURRENT 148.670 ± 1.092 147.898 149.443 11.87 ± 0.11
RUST_MULTITHREAD 12.521 ± 0.078 12.467 12.576 1.00
PYTHON_MULTITHREAD 59.309 ± 0.735 58.790 59.828 4.74 ± 0.07
PYTHON_MULTIPROCESS 594.714 ± 25.360 576.781 612.646 47.50 ± 2.05

Verifying 10,000 transactions with 255 inputs each

Command Mean [s] Min [s] Max [s] Relative
RUST_MULTITHREAD 121.682 121.682 121.682 1.00
PYTHON_MULTITHREAD 558.422 558.422 558.422 4.59
CURRENT 1454.168 1454.168 1454.168 11.95

glevco avatar Sep 02 '24 21:09 glevco

🐰Bencher

ReportMon, September 2, 2024 at 21:31:26 UTC
Projecthathor-core
Branchglevco/verification-experiments
Testbedubuntu-22.04
Click to view all benchmark results
BenchmarkLatencyLatency Results
nanoseconds (ns) | (Δ%)
Latency Lower Boundary
nanoseconds (ns) | (%)
Latency Upper Boundary
nanoseconds (ns) | (%)
sync-v2 (up to 20000 blocks)✅ (view plot)101,435,809,019.00 (-0.80%)92,030,274,231.18 (90.73%)112,481,446,282.55 (90.18%)

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

github-actions[bot] avatar Sep 02 '24 21:09 github-actions[bot]

What about tests with 1 or 2 inputs? Since 255 is the worst case for verification it's also probably the best case for optimization gains, but it's not the most common case by far. My reasoning is this:

  • If tests with few inputs show no significant slowdown, then it's a no-brainer and this is an absolute improvement (I don't expect this)
  • If tests with few inputs are significantly slower because of any added overhead, then we might want to consider keeping both methods and look for the N for which N inputs seem to start benefiting (this might not be the case, but I'm not sure)
  • If tests with few inputs are only slightly slower, then I think it's probably not worth any additional complexity, I'm not sure what "slightly slower" would be, but I guess we can find out (this is more what I would expect, but I think it's worth finding out)

jansegre avatar Sep 18 '24 21:09 jansegre