curve25519-dalek icon indicating copy to clipboard operation
curve25519-dalek copied to clipboard

Problems in reproducing benchmark speeds for a vector commitment

Open mskd12 opened this issue 4 years ago • 1 comments

Hello!

As the title suggests, I am looking for a VC where the commit operation can be done as fast as possible (ideally about a few milliseconds for vectors of size 10000). Curve25519 seemed like a natural choice. In addition, it seems like this library implements other methods that help my cause, e.g., precomputation & multi-exponentiation techniques (which came as a pleasant surprise, so thanks for that!). I noted the time reported for vartime_precomputed_pure_static (in the benchmark code) which I believe is precisely what I want, given below.

vector size, time to commit
10      [102.66 us 103.13 us 103.75 us]
100     [800.65 us 818.44 us 842.08 us]
1000    [7.6610 ms 7.7041 ms 7.7499 ms]
5000    [48.107 ms 50.274 ms 52.750 ms]

However, I am unable to reproduce these with the following small sample code (link). What's baffling is that I am consistently seeing about 10x worse numbers than the benchmark ones. For example, with a 100 sized vector, the commit time is 10.14ms (which took <1ms above). Any thoughts on why this might be happening and how I can reproduce benchmark speeds?

More general thoughts on other tricks I might use to fasten commit speeds are welcome! The one thing I have not yet tried is playing with better hardware or instruction sets.

mskd12 avatar Mar 16 '21 18:03 mskd12

Hi @mskd12!

Not sure the details of what you're doing, but I believe you'll probably want to be using the constant-time fixed-base multiscalar multiplication implementations for generating the commits.

There could be quite a few things happening here to cause the discrepancy in benchmark times, off the top of my head. The primary one I can see is that we use Criterion for our benchmark code, which handles priming the CPU and caches by running a bunch of the tests before taking the measurement, and also it throws out outliers from the results calculation, among other tricks. Are these numbers above from running cargo bench? Or something else?

Also, If you have access to a machine with avx2/ifma you'll get faster results. (Details on how to build with those backend features are in the README.)

isislovecruft avatar Mar 25 '21 03:03 isislovecruft