bc-java icon indicating copy to clipboard operation
bc-java copied to clipboard

BC 1.71 KeyStore.load performance downgrade

Open bsanchezb opened this issue 3 years ago • 13 comments

Hello,

When upgrading to BC 1.71 with jdk18on version, we have noticed a large performance downgrade when loading a PKCS12 KeyStore with BC provider. With default Java provider and BC version 1.70 it works pretty fast, while on version 1.71 the performance downgrade is quickly noticeable (30 seconds against max 3 seconds in 1.70 in our unit test). Please see our unit test below:

@Test
public void keyStoreProcessingTest() throws Exception {

	Security.addProvider(new BouncyCastleProvider());

	KeyStore keyStore = KeyStore.getInstance("PKCS12", "BC");

	File keystoreContentFile = new File("src/test/resources/extract-tls.p12");
	try (InputStream is = new FileInputStream(keystoreContentFile)) {
		assertTimeout(ofMillis(3000), () -> {
			keyStore.load(is, "ks-password".toCharArray());
		});
	}

}

Tested with this keystore : link.

Do you have an idea what could cause this downgrade?

Best regards, Aleksandr.

bsanchezb avatar Apr 07 '22 15:04 bsanchezb

Can't say we've seen anything quite like that. During the 30 seconds, does CPU utilisation drop off, or is this accompanied by high CPU usage as well?

dghgit avatar Apr 09 '22 14:04 dghgit

Hello @dghgit ,

I did some additional testing and indeed I observe a CPU utilization decrease when running the unit test with BC jdk18on 1.71. I did my tests simply with a Task Manager, so not sure if there is a better way to check a CPU usage?

Do not know the reason for the problem, but the issue seems to be permanent and the difference in execution time is about 10x times between 1.70 and 1.71 version when I switch the branches.

Any help would be appreciated, as this issue blocks an upgrade to the BC 1.71 for our project at this moment.

Best regards, Aleksandr.

bsanchezb avatar May 17 '22 14:05 bsanchezb

Is this on Windows or Linux. If it was Linux I'd say it sounds like entropy exhaustion, with Windows I'm not so sure.

dghgit avatar Jun 22 '22 23:06 dghgit

Hi @dghgit ,

It is Windows. I did not test on Linux so.

Best regards, Aleksandr.

bsanchezb avatar Jun 28 '22 06:06 bsanchezb

If you can generate a thread dump during the slowdown period that might help. Another question, if it is Windows, the SunMSCAPI is present isn't it? If it isn't try it with it.

dghgit avatar Jul 31 '22 08:07 dghgit

Please find below the thread dumps made for versions 1.70 and 1.71, using a simple unit test execution and processing in a debug mode. For 1.70, the stopping of thread results to different executable methods within the thread dump, while in 1.71 it always returns the same method (i.e. org.bouncycastle.math.Primes.enhancedMRProbablePrimeTest).

I tried with SunMSCAPI, but it does not seem to be making a difference.

threads_report_1.71_debug.txt threads_report_1.71_run_1.txt threads_report_1.71_run_2.txt threads_report_1.70_debug.txt threads_report_1.70_run.txt

After quick analysis, it seems like perhaps this commit is causing the issue. As it adds additional validation of RSA module with Primes.enhancedMRProbablePrimeTest method, in particular. Which seems to be very expensive (10x times difference for our keystore).

bsanchezb avatar Aug 01 '22 07:08 bsanchezb

Ah, okay, yes, how many RSA keys are in the store? Sounds like a few. Yes, looking at the stack trace the time being used is for RSA modulus validation, which is expensive. The only thing I can really suggest here is that we make it configurable, although the defaults are the minimum required for importing an RSA public key securely. You might want to consider changing the way RSA keys are imported.

dghgit avatar Aug 01 '22 10:08 dghgit

Hello @dghgit,

The keystore is quite special, it is used primary for performance testing against the regression, so that is why we spotted the issue in the first case. It contains almost 2500 certificates, most of which are with RSA keys. It is build upon certificates extracted from Trusted Lists over all European Member States (referenced by EU LOTL). The link to the keystore is in the first comment of this thread (forgot to provide password for the keystore - use "ks-password").

That is good to know that the reason for the performance downgrade is indeed a security perspective. Thank you for confirming this.

I think making the validation configurable would be beneficial in some of the cases, however I will not demand for this feature, as using of such huge keystores in production is quite rare and the extraction result could be cached if needed.

Thank you again for checking this.

KR, Aleksandr.

bsanchezb avatar Aug 01 '22 11:08 bsanchezb

Hi @dghgit again,

After some additional analysis, in fact, we found out that the validation method is applied not only on KeyStore loading, but for all X509Certificate's creation (and other operations with RSA keys, perhaps?). So actually the performance issue is still quite critical for us. Would it be possible, maybe to improve the performance within org.bouncycastle.math.Primes.enhancedMRProbablePrimeTest method? Otherwise, we would really need for a workaround to be able to disable the verification.

Thank you.

Best regards, Aleksandr.

bsanchezb avatar Aug 01 '22 14:08 bsanchezb

Hi @dghgit ,

Do you have any updates on this issue? Apparently 1.72 has been released without fixing this problem.

This is a blocking issue for us, as it increases the total processing time of EU LOTL + all dependent Trusted Lists to 2-3 times (creation of ~2500 certificates, most of which are with RSA key). Otherwise we would need to delegate the processing to Java Provider, which will imply some additional complexity in our code we would like to avoid.

Best regards, Aleksandr.

bsanchezb avatar Sep 29 '22 08:09 bsanchezb

try https://www.bouncycastle.org/betas 173b09 or later. There's now a system/security property "org.bouncycastle.rsa.max_mr_tests" which can be used to set a cap on the number of tests done, with 0 disabling them altogether. I'd suggest at least trying a setting of 1 first rather than zero. Number of tests usually range from 3-7.

dghgit avatar Nov 20 '22 22:11 dghgit

Hi @dghgit ,

Thank you for the property. I confirm, skipping all tests (with '0' value) solves the problem. However, even setting the value to "1" increases the processing time to previous values of 1.71 and 1.72. And I do not notice a significant difference between value "1" and others (like "3" or "7").

But anyway, thank you again for the resolution.

Looking forward to 1.73.

Best regards, Aleksandr.

bsanchezb avatar Nov 21 '22 08:11 bsanchezb

This issue seems to be caused by the fact that BC is running a Miller-Rabin primality test on the RSA modulus. This would explain why changing the number of runs to anything but 0 would not resolve your issue, given that the MR test will succeed on the first try with a relatively high likelihood. Primality checking the RSA modulus is a strange choice, since while RSA with a prime modulus would indeed be insecure, so would RSA with any number of defects in the key generation. I'm not aware of any vulnerability that has been caused by using a prime modulus. On the other hand, a relatively common vulnerability is the usage of close values for p and q (i.e. if q was generated as p.nextProbablePrime()), but the code is not running Fermat's method on the modulus (to be clear I am not recommending doing so, it would have less performance impact than running MR, but it would still cost substantially more cycles than a simple verification). The way I usually have seen checks for key generation defects being handled is as an asynchronous batch job, not in the critical path for RSA verification. Given that the public key has to be obtained from an authenticated source (a leaked key is insecure, no matter how it was generated), the fact that an RSA key can be broken in a myriad of ways not discernible without knowledge of the private key without running expensive attacks, and given that the main reason to prefer RSA over ECDSA is the verification performance (which is due to RSA verify only requiring 17 modular multiplications, while MR requires a few thousand), I do not see the value of adding any check beyond the parity check (which is necessary for the Montgomery reduction to be well defined).

sophieschmieg avatar Jun 29 '23 22:06 sophieschmieg