bls12-381
bls12-381 copied to clipboard
experimental wnaf type change + try flatten MultiExp input
This is experimental, just for illustration, it needs review, and maybe some of the changes are worthwhile.
The wnaf type:
- Didn't allocate enough when creating it, and keeps appending and re-allocating (from empty, up to 128, doubling allocation etc.)
- Uses
int
as element type, even though for the default window size, it could fit in aint8
, a 8x size reduction- Note: the benchmarks tried different window sizes. With int8, and the way the half size and full size works, you can't go higher than 6 if it needs to fit in
int8
, but that's fine. - If the window size was declared as constant (it's not only because of benchmarks), some allocations of slices here and there may just be declared as small fixed-size arrays that could fit in the stack? (Could maybe improve performance)
- Note: the benchmarks tried different window sizes. With int8, and the way the half size and full size works, you can't go higher than 6 if it needs to fit in
And MultiExp
has a []*PointG1
input, while I much prefer []PointG1
, since we allocate large arrays of these, and there is no use for an unnecessary extra layer of indirection. Or maybe an alternative function could be exposed, with this input type?
Benchmark:
goos: linux
goarch: amd64
pkg: github.com/kilic/bls12-381
cpu: AMD Ryzen 9 5950X 16-Core Processor
old new old new
BenchmarkG1MulWNAF/Naive-32 9961 8464 120897 ns/op 121713 ns/op
BenchmarkG1MulWNAF/Fr,_window:_1-32 10000 11935 119168 ns/op 104580 ns/op
BenchmarkG1MulWNAF/Big,_window:_1-32 7494 10000 148903 ns/op 140881 ns/op
BenchmarkG1MulWNAF/Fr,_window:_2-32 9423 11120 113026 ns/op 97312 ns/op
BenchmarkG1MulWNAF/Big,_window:_2-32 10000 10000 133106 ns/op 120360 ns/op
BenchmarkG1MulWNAF/Fr,_window:_3-32 10000 12624 118624 ns/op 98030 ns/op
BenchmarkG1MulWNAF/Big,_window:_3-32 8514 9373 133294 ns/op 124872 ns/op
BenchmarkG1MulWNAF/Fr,_window:_4-32 9913 10000 129724 ns/op 109758 ns/op
BenchmarkG1MulWNAF/Big,_window:_4-32 8151 9626 129339 ns/op 121394 ns/op
BenchmarkG1MulWNAF/Fr,_window:_5-32 9234 10000 128689 ns/op 106181 ns/op
BenchmarkG1MulWNAF/Big,_window:_5-32 8271 10033 140273 ns/op 133795 ns/op
BenchmarkG1MulWNAF/Fr,_window:_6-32 7934 161776 ns/op
BenchmarkG1MulWNAF/Big,_window:_6-32 7244 186107 ns/op
BenchmarkG1MulWNAF/Fr,_window:_7-32 5818 202863 ns/op
BenchmarkG1MulWNAF/Big,_window:_7-32 4736 219887 ns/op
BenchmarkG1MulGLV/Naive-32 9211 9111 126705 ns/op 129577 ns/op
BenchmarkG1MulGLV/Fr,_window:_1-32 12975 14316 87115 ns/op 77555 ns/op
BenchmarkG1MulGLV/Big,_window:_1-32 10797 10000 124148 ns/op 105504 ns/op
BenchmarkG1MulGLV/Fr,_window:_2-32 15717 16918 78684 ns/op 71145 ns/op
BenchmarkG1MulGLV/Big,_window:_2-32 10000 13036 103570 ns/op 99024 ns/op
BenchmarkG1MulGLV/Fr,_window:_3-32 15278 16030 87883 ns/op 73374 ns/op
BenchmarkG1MulGLV/Big,_window:_3-32 10928 13880 110459 ns/op 91182 ns/op
BenchmarkG1MulGLV/Fr,_window:_4-32 14187 15295 107709 ns/op 80290 ns/op
BenchmarkG1MulGLV/Big,_window:_4-32 9690 11877 113151 ns/op 105051 ns/op
BenchmarkG1MulGLV/Fr,_window:_5-32 10059 12541 123242 ns/op 95641 ns/op
BenchmarkG1MulGLV/Big,_window:_5-32 7377 11188 152681 ns/op 103277 ns/op
BenchmarkG1MulGLV/Fr,_window:_6-32 7101 163839 ns/op
BenchmarkG1MulGLV/Big,_window:_6-32 6588 154301 ns/op
BenchmarkG1MulGLV/Fr,_window:_7-32 5584 204235 ns/op
BenchmarkG1MulGLV/Big,_window:_7-32 5283 221057 ns/op
BenchmarkG1MultiExp/100-32 399 393 3038978 ns/op 3118516 ns/op
BenchmarkG1MultiExp/1000-32 69 75 16212074 ns/op 16841531 ns/op
BenchmarkG2MulWNAF/Naive-32 4137 4191 296644 ns/op 280547 ns/op
BenchmarkG2MulWNAF/Fr,_window:_1-32 5106 5712 240345 ns/op 203272 ns/op
BenchmarkG2MulWNAF/Big,_window:_1-32 5184 5575 265715 ns/op 244774 ns/op
BenchmarkG2MulWNAF/Fr,_window:_2-32 5048 5533 255389 ns/op 210939 ns/op
BenchmarkG2MulWNAF/Big,_window:_2-32 4590 5036 269946 ns/op 224165 ns/op
BenchmarkG2MulWNAF/Fr,_window:_3-32 5312 5796 224518 ns/op 211165 ns/op
BenchmarkG2MulWNAF/Big,_window:_3-32 4630 5412 239869 ns/op 224360 ns/op
BenchmarkG2MulWNAF/Fr,_window:_4-32 5008 5554 239132 ns/op 203044 ns/op
BenchmarkG2MulWNAF/Big,_window:_4-32 4561 5437 242594 ns/op 234347 ns/op
BenchmarkG2MulWNAF/Fr,_window:_5-32 5698 6247 228788 ns/op 221313 ns/op
BenchmarkG2MulWNAF/Big,_window:_5-32 4977 5979 277129 ns/op 237378 ns/op
BenchmarkG2MulWNAF/Fr,_window:_6-32 3754 334291 ns/op
BenchmarkG2MulWNAF/Big,_window:_6-32 4237 345307 ns/op
BenchmarkG2MulWNAF/Fr,_window:_7-32 2660 413984 ns/op
BenchmarkG2MulWNAF/Big,_window:_7-32 2678 461477 ns/op
BenchmarkG2MulGLV/Naive-32 3782 4016 299864 ns/op 289786 ns/op
BenchmarkG2MulGLV/Fr,_window:_1-32 6756 7719 193997 ns/op 153801 ns/op
BenchmarkG2MulGLV/Big,_window:_1-32 5253 6594 203329 ns/op 167729 ns/op
BenchmarkG2MulGLV/Fr,_window:_2-32 6775 7752 154446 ns/op 152584 ns/op
BenchmarkG2MulGLV/Big,_window:_2-32 5881 8366 177652 ns/op 174940 ns/op
BenchmarkG2MulGLV/Fr,_window:_3-32 7868 9421 160579 ns/op 133884 ns/op
BenchmarkG2MulGLV/Big,_window:_3-32 6894 6538 188744 ns/op 163645 ns/op
BenchmarkG2MulGLV/Fr,_window:_4-32 7194 7750 180463 ns/op 149795 ns/op
BenchmarkG2MulGLV/Big,_window:_4-32 5620 6684 198709 ns/op 167899 ns/op
BenchmarkG2MulGLV/Fr,_window:_5-32 6284 5954 233635 ns/op 186982 ns/op
BenchmarkG2MulGLV/Big,_window:_5-32 4880 8649 215989 ns/op 204473 ns/op
BenchmarkG2MulGLV/Fr,_window:_6-32 7910 290345 ns/op
BenchmarkG2MulGLV/Big,_window:_6-32 3234 316511 ns/op
BenchmarkG2MulGLV/Fr,_window:_7-32 5815 394321 ns/op
BenchmarkG2MulGLV/Big,_window:_7-32 2563 431117 ns/op
BenchmarkG2MultiExp/100-32 169 160 6877217 ns/op 7044235 ns/op
BenchmarkG2MultiExp/1000-32 30 30 40613382 ns/op 41251641 ns/op
PASS
Hey @kilic, can we productionize this improvement? I can polish the code and make the PR ready if you can give some initial feedback?
@protolambda Thanks! Sure we can add this improvement.
I'm happy with switching to []PointG1/2
. Can you also apply the same to the MultiExpBig
function?
Constant window size and smaller type also sound good to me. It seems like window size 4
gives the best result for G1 and both wnaf and glv multiplications. And again for both wnaf and glv in G2 there is a tiny difference between 3
and 4
. Shared my results below and includes Fr
type.
G1 wnaf
BenchmarkG1MulWNAF/Fr,_window:_1-8 8690 134234 ns/op
BenchmarkG1MulWNAF/Fr,_window:_2-8 8943 127914 ns/op
BenchmarkG1MulWNAF/Fr,_window:_3-8 9138 125774 ns/op
BenchmarkG1MulWNAF/Fr,_window:_4-8 9255 122877 ns/op
BenchmarkG1MulWNAF/Fr,_window:_5-8 9462 126541 ns/op
G1 glv
BenchmarkG1MulGLV/Fr,_window:_1-8 10000 100515 ns/op
BenchmarkG1MulGLV/Fr,_window:_2-8 13446 89177 ns/op
BenchmarkG1MulGLV/Fr,_window:_3-8 13623 87922 ns/op
BenchmarkG1MulGLV/Fr,_window:_4-8 13941 85936 ns/op
BenchmarkG1MulGLV/Fr,_window:_5-8 13285 89931 ns/op
BenchmarkG1MulGLV/Fr,_window:_6-8 10000 103667 ns/op
G2 wnaf
BenchmarkG2MulWNAF/Fr,_window:_1-8 3534 333858 ns/op
BenchmarkG2MulWNAF/Fr,_window:_2-8 3535 327158 ns/op
BenchmarkG2MulWNAF/Fr,_window:_3-8 3783 309761 ns/op
BenchmarkG2MulWNAF/Fr,_window:_4-8 3913 302329 ns/op
BenchmarkG2MulWNAF/Fr,_window:_5-8 3908 303272 ns/op
BenchmarkG2MulWNAF/Fr,_window:_6-8 3643 322695 ns/op
G1 glv
BenchmarkG2MulGLV/Fr,_window:_1-8 4539 232194 ns/op
BenchmarkG2MulGLV/Fr,_window:_2-8 5492 211677 ns/op
BenchmarkG2MulGLV/Fr,_window:_3-8 5928 195555 ns/op
BenchmarkG2MulGLV/Fr,_window:_4-8 6051 195011 ns/op
BenchmarkG2MulGLV/Fr,_window:_5-8 5682 205829 ns/op
BenchmarkG2MulGLV/Fr,_window:_6-8 4831 241982 ns/op