fastecdsa
fastecdsa copied to clipboard
Speed compare with python-ecdsa?
I've recently merged a faster implementation of elliptic curve arithmetic to python-ecdsa (https://github.com/warner/python-ecdsa/pull/127) and was trying to compare performance with this library, but I'm getting some silly numbers.
(after running pip install -e .
I got the benchmark running, see #44 though)
with fastecdsa
50a3cdfdc3824 I'm getting
1000 signatures and verifications with curve P192 took 1.47 seconds
1000 signatures and verifications with curve P224 took 1.89 seconds
1000 signatures and verifications with curve P256 took 2.42 seconds
1000 signatures and verifications with curve P384 took 5.15 seconds
1000 signatures and verifications with curve P521 took 9.48 seconds
1000 signatures and verifications with curve secp192k1 took 1.47 seconds
1000 signatures and verifications with curve secp224k1 took 1.89 seconds
1000 signatures and verifications with curve secp256k1 took 2.43 seconds
1000 signatures and verifications with curve brainpoolP160r1 took 1.05 seconds
1000 signatures and verifications with curve brainpoolP192r1 took 1.46 seconds
1000 signatures and verifications with curve brainpoolP224r1 took 1.88 seconds
1000 signatures and verifications with curve brainpoolP256r1 took 2.41 seconds
1000 signatures and verifications with curve brainpoolP320r1 took 3.66 seconds
1000 signatures and verifications with curve brainpoolP384r1 took 5.12 seconds
1000 signatures and verifications with curve brainpoolP512r1 took 8.96 seconds
while with current master of python-ecdsa (8deb089e7d5), with gmpy2 installed, I'm getting:
siglen keygen keygen/s sign sign/s verify verify/s
NIST192p: 48 0.00016s 6138.39 0.00017s 5781.36 0.00033s 2996.27
NIST224p: 56 0.00020s 4882.81 0.00022s 4625.53 0.00042s 2391.89
NIST256p: 64 0.00023s 4332.80 0.00024s 4130.10 0.00048s 2092.61
NIST384p: 96 0.00040s 2503.29 0.00041s 2420.89 0.00080s 1247.25
NIST521p: 132 0.00071s 1417.89 0.00072s 1388.78 0.00139s 721.06
SECP256k1: 64 0.00023s 4276.34 0.00024s 4149.60 0.00046s 2187.62
BRAINPOOLP160r1: 40 0.00014s 7198.78 0.00015s 6814.04 0.00029s 3495.85
BRAINPOOLP192r1: 48 0.00016s 6143.59 0.00017s 5831.40 0.00032s 3121.77
BRAINPOOLP224r1: 56 0.00020s 4892.69 0.00022s 4578.01 0.00041s 2458.99
BRAINPOOLP256r1: 64 0.00023s 4321.77 0.00024s 4156.66 0.00051s 1972.34
BRAINPOOLP320r1: 80 0.00031s 3218.62 0.00032s 3100.46 0.00060s 1660.37
BRAINPOOLP384r1: 96 0.00041s 2465.49 0.00042s 2401.97 0.00080s 1248.43
BRAINPOOLP512r1: 128 0.00062s 1609.65 0.00063s 1576.10 0.00127s 789.65
so for NIST192p, the combined sign and verify looks to be almost 3 times faster than fastecdsa
, which doesn't make sense for the non-native code in python-ecdsa
... Are the benchmark commands doing different things?
Without knowing the details of the changes you made with ecdsa
and the how ecdsa
benchmarks itself it's hard to say where the discrepancy is. A couple possibilities come to mind -
-
ecdsa
uses a faster multiplication algorithm than the Montgomery Ladder (which is a bit slower but mitigates some timing side channels - although not all see issue #40 ) -
gmpy2
has some optimizations that make it perform better on certain platforms, or perhaps is compiled with more aggressive flags (this seems unlikely though) -
ecdsa
benchmarks use small private keys, which would make for quicker multiplication operations in the EC group.
I'm happy to look at this further (and if it's a matter of gmpy2
being the speedup of switching from native C code to that package). Currently the focus for this package is on switching to being python3 only since python2 is EOL, so there'll be more time to look at this and consider optimizations one the 2.0
release is in a stable state (shouldn't be too long from now)
Looking further at the merge request you have these are candidates for the increase in speed as well -
- Use 2-ary Non-Adjacent Form for multiplication
- Precompute point doublings for curve generators and public keys
I think pre-computation can be considered as an optimization for this package but if I remember correctly 2ary NAF leaks timing information that can be used to gain knowledge of the secret key.
Without knowing the details of the changes you made with
ecdsa
and the howecdsa
benchmarks itself it's hard to say where the discrepancy is. A couple possibilities come to mind -
ecdsa
uses a faster multiplication algorithm than the Montgomery Ladder (which is a bit slower but mitigates some timing side channels - although not all see issue #40 )
that would most likely explain this: ecdsa
doesn't even try to protect against side-channels, so it can use algorithms that do leak timing information
that being said, libgmp
is side channel secure? I see mpz_powm_sec
in documentation, but nothing similar for multiplication...
gmpy2
has some optimizations that make it perform better on certain platforms, or perhaps is compiled with more aggressive flags (this seems unlikely though)
it is just a python binding to libgmp
, which as I understand, fastecdsa
uses too, and I didn't compile different versions of libgmp
for those tests
ecdsa
benchmarks use small private keys, which would make for quicker multiplication operations in the EC group.
no, the private keys use the full size of the order for a given curve
I think pre-computation can be considered as an optimization for this package but if I remember correctly 2ary NAF leaks timing information that can be used to gain knowledge of the secret key.
yes, both precomputation and NAF leak, but it does matter only for signing and keypair generation, not for signature verification
that being said, libgmp is side channel secure? I see mpz_powm_sec in documentation, but nothing similar for multiplication...
Based on the GMP documentation it seems implicit that unless otherwise specified (e.g. unless there is an explicit secure function for an operation) that operations are suitable for cryptography -
The main target applications for GMP are cryptography applications and research ([src] What is GMP? section)
But perhaps that is an incorrect reading on my part.
I'm happy to discuss further performance improvements here, but you're correct that doing some security tradeoffs will allow for bigger performance improvements, as seen in the changes you made for ecdsa
(which as you stated wasn't aimed at mitigating most side channels to begin with). Your ecdsa
improvments are great btw, kudos for making such a big performance improvement for such a widely used library.
I personally would love to see some tradeoffs being applied to fastecdsa. In our application, we use very rare keygens, a few sigs, and a lot of sig verification. a "quicker but unsafe" mode would be very helpful for the later. Moreover, the pubkeys for sig verification are mostly known, so a cache of precomputed values could make sense.
I'll take a look at the best way to do this once the 2.0
release is stable, shouldn't be much longer.
that being said, libgmp is side channel secure? I see mpz_powm_sec in documentation, but nothing similar for multiplication...
Based on the GMP documentation it seems implicit that unless otherwise specified (e.g. unless there is an explicit secure function for an operation) that operations are suitable for cryptography -
The main target applications for GMP are cryptography applications and research ([src] What is GMP? section)
I'm afraid that they meant specifically the mpz_powm_sec
method by "cryptography"... I really can't imagine how you can multiply two arbitrary integers without knowing the size of the field (providing the modulo for multiplication) and not leak bitsize of operands. I might have missed this method, but I don't see a method for multiplying two numbers and calculating modulo as a single operation.
@EggPool :
Moreover, the pubkeys for sig verification are mostly known, so a cache of precomputed values could make sense.
the problem is that precomputing values for public keys is fairly expensive, if precomputation is used together with NAF, you'd need to verify at least 80-100 signatures made with one and the same key for the precomputation to break even
that being said, libgmp is side channel secure? I see mpz_powm_sec in documentation, but nothing similar for multiplication...
Based on the GMP documentation it seems implicit that unless otherwise specified (e.g. unless there is an explicit secure function for an operation) that operations are suitable for cryptography -
The main target applications for GMP are cryptography applications and research ([src] What is GMP? section)
I'm afraid that they meant specifically the
mpz_powm_sec
method by "cryptography"... I really can't imagine how you can multiply two arbitrary integers without knowing the size of the field (providing the modulo for multiplication) and not leak bitsize of operands. I might have missed this method, but I don't see a method for multiplying two numbers and calculating modulo as a single operation.
I've looked a bit more into the GNU MP, and I'm pretty sure that the "useful for cryptography" is limited to functions explicitly listed in https://gmplib.org/manual/Low_002dlevel-Functions#Low_002dlevel-functions-for-cryptography:
In addition to these specially crafted functions, the following mpn functions are naturally side-channel resistant: mpn_add_n, mpn_sub_n, mpn_lshift, mpn_rshift, mpn_zero, mpn_copyi, mpn_copyd, mpn_com, and the logical function (mpn_and_n, etc).