fastecdsa Speed compare with python-ecdsa?

I've recently merged a faster implementation of elliptic curve arithmetic to python-ecdsa (https://github.com/warner/python-ecdsa/pull/127) and was trying to compare performance with this library, but I'm getting some silly numbers.

(after running pip install -e . I got the benchmark running, see #44 though)

with fastecdsa 50a3cdfdc3824 I'm getting

1000 signatures and verifications with curve P192 took 1.47 seconds
1000 signatures and verifications with curve P224 took 1.89 seconds
1000 signatures and verifications with curve P256 took 2.42 seconds
1000 signatures and verifications with curve P384 took 5.15 seconds
1000 signatures and verifications with curve P521 took 9.48 seconds
1000 signatures and verifications with curve secp192k1 took 1.47 seconds
1000 signatures and verifications with curve secp224k1 took 1.89 seconds
1000 signatures and verifications with curve secp256k1 took 2.43 seconds
1000 signatures and verifications with curve brainpoolP160r1 took 1.05 seconds
1000 signatures and verifications with curve brainpoolP192r1 took 1.46 seconds
1000 signatures and verifications with curve brainpoolP224r1 took 1.88 seconds
1000 signatures and verifications with curve brainpoolP256r1 took 2.41 seconds
1000 signatures and verifications with curve brainpoolP320r1 took 3.66 seconds
1000 signatures and verifications with curve brainpoolP384r1 took 5.12 seconds
1000 signatures and verifications with curve brainpoolP512r1 took 8.96 seconds

while with current master of python-ecdsa (8deb089e7d5), with gmpy2 installed, I'm getting:

                  siglen    keygen   keygen/s      sign     sign/s    verify   verify/s
        NIST192p:     48   0.00016s   6138.39   0.00017s   5781.36   0.00033s   2996.27
        NIST224p:     56   0.00020s   4882.81   0.00022s   4625.53   0.00042s   2391.89
        NIST256p:     64   0.00023s   4332.80   0.00024s   4130.10   0.00048s   2092.61
        NIST384p:     96   0.00040s   2503.29   0.00041s   2420.89   0.00080s   1247.25
        NIST521p:    132   0.00071s   1417.89   0.00072s   1388.78   0.00139s    721.06
       SECP256k1:     64   0.00023s   4276.34   0.00024s   4149.60   0.00046s   2187.62
 BRAINPOOLP160r1:     40   0.00014s   7198.78   0.00015s   6814.04   0.00029s   3495.85
 BRAINPOOLP192r1:     48   0.00016s   6143.59   0.00017s   5831.40   0.00032s   3121.77
 BRAINPOOLP224r1:     56   0.00020s   4892.69   0.00022s   4578.01   0.00041s   2458.99
 BRAINPOOLP256r1:     64   0.00023s   4321.77   0.00024s   4156.66   0.00051s   1972.34
 BRAINPOOLP320r1:     80   0.00031s   3218.62   0.00032s   3100.46   0.00060s   1660.37
 BRAINPOOLP384r1:     96   0.00041s   2465.49   0.00042s   2401.97   0.00080s   1248.43
 BRAINPOOLP512r1:    128   0.00062s   1609.65   0.00063s   1576.10   0.00127s    789.65

so for NIST192p, the combined sign and verify looks to be almost 3 times faster than fastecdsa, which doesn't make sense for the non-native code in python-ecdsa... Are the benchmark commands doing different things?

Dec 02 '19 22:12 tomato42

Without knowing the details of the changes you made with ecdsa and the how ecdsa benchmarks itself it's hard to say where the discrepancy is. A couple possibilities come to mind -

ecdsa uses a faster multiplication algorithm than the Montgomery Ladder (which is a bit slower but mitigates some timing side channels - although not all see issue #40 )
gmpy2 has some optimizations that make it perform better on certain platforms, or perhaps is compiled with more aggressive flags (this seems unlikely though)
ecdsa benchmarks use small private keys, which would make for quicker multiplication operations in the EC group.

I'm happy to look at this further (and if it's a matter of gmpy2 being the speedup of switching from native C code to that package). Currently the focus for this package is on switching to being python3 only since python2 is EOL, so there'll be more time to look at this and consider optimizations one the 2.0 release is in a stable state (shouldn't be too long from now)

Dec 04 '19 07:12 AntonKueltz

Looking further at the merge request you have these are candidates for the increase in speed as well -

Use 2-ary Non-Adjacent Form for multiplication
Precompute point doublings for curve generators and public keys

I think pre-computation can be considered as an optimization for this package but if I remember correctly 2ary NAF leaks timing information that can be used to gain knowledge of the secret key.

Dec 04 '19 07:12 AntonKueltz

Without knowing the details of the changes you made with ecdsa and the how ecdsa benchmarks itself it's hard to say where the discrepancy is. A couple possibilities come to mind -

ecdsa uses a faster multiplication algorithm than the Montgomery Ladder (which is a bit slower but mitigates some timing side channels - although not all see issue #40 )

that would most likely explain this: ecdsa doesn't even try to protect against side-channels, so it can use algorithms that do leak timing information

that being said, libgmp is side channel secure? I see mpz_powm_sec in documentation, but nothing similar for multiplication...

gmpy2 has some optimizations that make it perform better on certain platforms, or perhaps is compiled with more aggressive flags (this seems unlikely though)

it is just a python binding to libgmp, which as I understand, fastecdsa uses too, and I didn't compile different versions of libgmp for those tests

ecdsa benchmarks use small private keys, which would make for quicker multiplication operations in the EC group.

no, the private keys use the full size of the order for a given curve

I think pre-computation can be considered as an optimization for this package but if I remember correctly 2ary NAF leaks timing information that can be used to gain knowledge of the secret key.

yes, both precomputation and NAF leak, but it does matter only for signing and keypair generation, not for signature verification

Dec 04 '19 10:12 tomato42

that being said, libgmp is side channel secure? I see mpz_powm_sec in documentation, but nothing similar for multiplication...

Based on the GMP documentation it seems implicit that unless otherwise specified (e.g. unless there is an explicit secure function for an operation) that operations are suitable for cryptography -

The main target applications for GMP are cryptography applications and research ([src] What is GMP? section)

But perhaps that is an incorrect reading on my part.

I'm happy to discuss further performance improvements here, but you're correct that doing some security tradeoffs will allow for bigger performance improvements, as seen in the changes you made for ecdsa (which as you stated wasn't aimed at mitigating most side channels to begin with). Your ecdsa improvments are great btw, kudos for making such a big performance improvement for such a widely used library.

Dec 05 '19 21:12 AntonKueltz

I personally would love to see some tradeoffs being applied to fastecdsa. In our application, we use very rare keygens, a few sigs, and a lot of sig verification. a "quicker but unsafe" mode would be very helpful for the later. Moreover, the pubkeys for sig verification are mostly known, so a cache of precomputed values could make sense.

Dec 06 '19 07:12 EggPool

I'll take a look at the best way to do this once the 2.0 release is stable, shouldn't be much longer.

Dec 09 '19 06:12 AntonKueltz

that being said, libgmp is side channel secure? I see mpz_powm_sec in documentation, but nothing similar for multiplication...

Based on the GMP documentation it seems implicit that unless otherwise specified (e.g. unless there is an explicit secure function for an operation) that operations are suitable for cryptography -

The main target applications for GMP are cryptography applications and research ([src] What is GMP? section)

I'm afraid that they meant specifically the mpz_powm_sec method by "cryptography"... I really can't imagine how you can multiply two arbitrary integers without knowing the size of the field (providing the modulo for multiplication) and not leak bitsize of operands. I might have missed this method, but I don't see a method for multiplying two numbers and calculating modulo as a single operation.

@EggPool :

Moreover, the pubkeys for sig verification are mostly known, so a cache of precomputed values could make sense.

the problem is that precomputing values for public keys is fairly expensive, if precomputation is used together with NAF, you'd need to verify at least 80-100 signatures made with one and the same key for the precomputation to break even

Dec 10 '19 13:12 tomato42

that being said, libgmp is side channel secure? I see mpz_powm_sec in documentation, but nothing similar for multiplication...

Based on the GMP documentation it seems implicit that unless otherwise specified (e.g. unless there is an explicit secure function for an operation) that operations are suitable for cryptography -

The main target applications for GMP are cryptography applications and research ([src] What is GMP? section)

I'm afraid that they meant specifically the mpz_powm_sec method by "cryptography"... I really can't imagine how you can multiply two arbitrary integers without knowing the size of the field (providing the modulo for multiplication) and not leak bitsize of operands. I might have missed this method, but I don't see a method for multiplying two numbers and calculating modulo as a single operation.

I've looked a bit more into the GNU MP, and I'm pretty sure that the "useful for cryptography" is limited to functions explicitly listed in https://gmplib.org/manual/Low_002dlevel-Functions#Low_002dlevel-functions-for-cryptography:

In addition to these specially crafted functions, the following mpn functions are naturally side-channel resistant: mpn_add_n, mpn_sub_n, mpn_lshift, mpn_rshift, mpn_zero, mpn_copyi, mpn_copyd, mpn_com, and the logical function (mpn_and_n, etc).

May 02 '22 17:05 tomato42

fastecdsa fastecdsa copied to clipboard

Speed compare with python-ecdsa?

fastecdsa
fastecdsa copied to clipboard