Performance analysis of DATASET_PARENTS=512
The ProgPoW software audit recommend to increase the DATASET_PARENTS Ethash cache parameter from 256 to 512. This has direct impact on verification performance as the time for single verification doubles (while ProgPoW verification slowdown is only 30-50% over Ethash).
The DATASET_PARENTS increase makes the verification "even more" memory hard and lowers the instruction per cycle ratio to 1 (the max being 4).
ProgPoW verification, DATASET_PARENTS = 256, epoch 0:
cset shield -- perf stat -B -e cache-references,cache-misses,cycles,instructions test/ethash-bench --benchmark_filter=progpow_hash/0
cset: **> 1 tasks are not movable, impossible to move
cset: --> last message, executed args into cpuset "/user", new pid is: 10825
2019-09-10 14:19:50
Running test/ethash-bench
Run on (8 X 4400 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 8192K (x1)
------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------
progpow_hash/0 1960 us 1960 us 347
Performance counter stats for 'test/ethash-bench --benchmark_filter=progpow_hash/0':
65 642 783 cache-references
39 184 374 cache-misses # 59,693 % of all cache refs
5 636 657 996 cycles
7 104 679 821 instructions # 1,26 insn per cycle
1,314309256 seconds time elapsed
1,296116000 seconds user
0,000000000 seconds sys
ProgPoW verification, DATASET_PARENTS = 512, epoch 0:
cset shield -- perf stat -B -e cache-references,cache-misses,cycles,instructions test/ethash-bench --benchmark_filter=progpow_hash/0
cset: **> 1 tasks are not movable, impossible to move
cset: --> last message, executed args into cpuset "/user", new pid is: 10697
2019-09-10 14:19:26
Running test/ethash-bench
Run on (8 X 4400 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 8192K (x1)
------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------
progpow_hash/0 3695 us 3694 us 195
Performance counter stats for 'test/ethash-bench --benchmark_filter=progpow_hash/0':
87 073 601 cache-references
48 426 695 cache-misses # 55,616 % of all cache refs
6 589 826 522 cycles
6 898 095 482 instructions # 1,05 insn per cycle
1,534862112 seconds time elapsed
1,512262000 seconds user
0,004011000 seconds sys
How about increasing the size of the DAG cache instead, above Ethereum's current curve, at the time of the switch to ProgPoW? Sizes of a few hundred MB should be acceptable for light verification now, and wouldn't result in significantly slower verification (right?)
Sizes of a few hundred MB should be acceptable for light verification now, and wouldn't result in significantly slower verification (right?)
That does not seem to be the case. From my observations, the verification strictly depends on the Cache access time and depends on L3 cache size in CPU. The more memory of the Cache will not fit into L3 cache the slower it will be.
I used the word "significantly" specifically to account for the potential slight slowdown from the lower L3 cache hit rate. The DAG cache is already in excess of typical CPUs' L3 cache sizes (although those are increasing as well). In my experience (not with Ethash/ProgPoW, though), while L3 cache is a lot faster than RAM in synthetic benchmarks designed to fit in the cache, it provides little speedup for non-trivial algorithms - e.g., for yescrypt on a typical server platform there's little reduction in bandwidth when going from 16 MiB to higher sizes (even when I tweak it to reduce the amount of computation so that it could potentially use more bandwidth with the lower sizes). I've even seen cases where L3 cache hurt performance, compared to reading non-cached data from RAM, when the data happened to be cached in a CPU in a different socket.