Improve FIPS RSA keygen performance - change BN_gcd() test
FIPS 186-4 has 5 different algorithms for key generation, and all of them rely on testing GCD(a,n) == 1 many times.
Cachegrind was showing that the function BN_gcd() was taking a considerable percentage of the total cycles, during a rsa keygen operation.
The default provider uses multiprime keygen, which seemed to be much faster. This is because it uses BN_mod_inverse() instead.
For a 4096 bit key, the entropy of a key that was taking a long time to generate was recorded and fed back into subsequent runs. Roughly 40% of the cycle time was BN_gcd() with most of the remainder in the prime testing. Changing to use the inverse resulted in the cyle count being 96% in the prime testing.
Checklist
- [x] documentation is added or updated
- [x] tests are added or updated

My primary concern is that BN_mod_inverse() isn't strictly constant time:
- It copies the input BNs, dropping any constant time flags
- It adds constant times flags for the modulo and division operations
- It doesn't for the multiplication and addition
- It doing Euclid's algorithm by changing pointers rather than constant time swaps or copies
- This was flagged as badness in #19309
- However, all values are accessed each iteration so exploiting the cache would seem to be a difficult side channel
These points can be addressed easily enough but will cost some performance.
- Constant time multiply and add will be fast compared to division and modulo
- Constant time BN copy will be much faster than the BN swap in #19309 because it uses
memcpy(2)instead of constant time select in a loop
I would like to see even a theoretical attack on this BN_mod_inverse use in the RSA keygen. How would it work given the key is generated once so there is no real chance for the attacker to do repeated measurement to gather some statistics. But yeah, if making BN_mod_inverse consttime would not cost much, it would make sense.
If this attack were only theoretical, why did we go to such lengths to be constant time in the BN mod function for keygen? I'm not saying it's exploitable, just that we need to be cautious.
I would like to see even a theoretical attack on this BN_mod_inverse use in the RSA keygen. How would it work given the key is generated once so there is no real chance for the attacker to do repeated measurement to gather some statistics. But yeah, if making BN_mod_inverse consttime would not cost much, it would make sense.
@t8m These seem to be very closely related:
- Section 3.3 of https://urn.fi/URN:ISBN:978-952-03-2289-2 (for a brief recap)
- https://doi.org/10.13154/tches.v2019.i4.213-242
- https://doi.acm.org/10.1145/3196494.3196524
If the attacker can probe the keygen process often enough (either through power side channel or through microarchitecutral means), they may be able to deduce exact code path taken
That being said, this:
/* (Step 1) GCD(2r1, r2) = 1 */
&& gcd_isone(tmp, r1x2, r2, ctx)
/* (Step 2) R = ((r2^-1 mod 2r1) * r2) - ((2r1^-1 mod r2)*2r1) */
&& BN_mod_inverse(R, r2, r1x2, ctx)
is calling BN_mod_inverse twice over the same arguments, so if inverse is leaky in an exploitable way, it's a problem already, and this change doesn't make the situation worse.
The second change is much harder to say if it makes the code more vulnerable or not.
Do you think it would be useful to have a public symbol BN_gcd_isone()?
Not sure if anyone noticed the comment related to the default provider. i.e the multi prime keygen already uses the inverse instead of BN_gcd() - so this has been used like this for quite a long time.
Do you think it would be useful to have a public symbol BN_gcd_isone()?
I think we should expose it as a BN_are_coprime() call.
What am I supposed to do with the tmp inv param? I was trying to avoid allocating it internally? (add it as an optional last param?)
BN_mod_inverse is allocating a pile of BIGNUMs internal, I doubt one more will hurt any.
Needs rebase or putting the libcrypto.num entry at some more clever place :grin:
This pull request is ready to merge
@paulidale please reconfirm
This pull request is ready to merge
Merged to master and 3.1 branches (after fixing the version in libcrypto.num entry). Thank you.