ladybird icon indicating copy to clipboard operation
ladybird copied to clipboard

LibCrypto+AK: Merge LibCrypto/SecureRandom into AK/Random

Open colleirose opened this issue 6 months ago • 26 comments

AK/Random does the same thing as SecureRandom but with worse error handling. It can't be removed because it's needed in AK/, and OpenSSL's RAND_bytes ideally shouldn't be included in AK/. So I've kept AK/Random's approach of using the operating system CSPRNG directly, but improved the error handling, and removed SecureRandom.

This also guarantees that BlockAllocator::allocate_block in BlockAllocator.cpp will always either use a CSPRNG for allocation or throw an error, rather than falling back to the insecure RNG in the event of an error.

colleirose avatar Oct 23 '25 23:10 colleirose

Hello!

One or more of the commit messages in this PR do not match the Ladybird code submission policy, please check the lint_commits CI job for more details on which commits were flagged and why. Please do not close this PR and open another, instead modify your commit message(s) with git commit --amend and force push those changes to update this PR.

ladybird-bot avatar Oct 23 '25 23:10 ladybird-bot

@colleirose CI is failing; please make sure everything compiles locally.

gmta avatar Oct 24 '25 07:10 gmta

Why is the windows workflow not running?

R-Goc avatar Oct 24 '25 11:10 R-Goc

Why is the windows workflow not running?

Because CI does not run automatically for new contributors. I'm waiting with approval until the compilation errors on Linux have been fixed.

gmta avatar Oct 24 '25 11:10 gmta

Because CI does not run automatically for new contributors. I'm waiting with approval until the compilation errors on Linux have been fixed.

I didn't experience compilation errors on Linux but I will apply the requested changes once I'm free and check again that it all compiles correctly. I will also test in a Windows VM to make sure that the code works correctly there too, I probably should have thought of that earlier.

colleirose avatar Oct 24 '25 15:10 colleirose

I'm not a huge fan of this change for the following reasons:

  1. We lose the explicitness in code (e.g. fill_with_random vs fill_with_secure_random)
  2. CSPRNGs are fast, but still more complex and generally slower than PRNGs

I'm not a maintainer, but I think we should keep a dedicated CSPRNG in LibCrypto and potentially remove AK/Random until a high-performance PRNG is actually needed. Unless I missed something, the conversation with Jelle on discord also says something similar:

You'd have two different classes for two different purposes, regardless of their current implementation

rgret-dev avatar Oct 25 '25 15:10 rgret-dev

I'm not a huge fan of this change for the following reasons:

  1. We lose the explicitness in code (e.g. fill_with_random vs fill_with_secure_random)
  2. CSPRNGs are fast, but still more complex and generally slower than PRNGs

I'm not a maintainer, but I think we should keep a dedicated CSPRNG in LibCrypto and potentially remove AK/Random until a high-performance PRNG is actually needed.

Indeed we do lose the more explicit naming, however the quality shouldn't go down? I can read into whatever openssl is doing and see.

We don't need this to be the fastest thing on the planet. I was already able to get gigabytes per second of throughput with the windows implementation, I believe Linux shouldn't be far behind if at all. Writing a fast PRNG is also not trivial and usually needs intrinsics.

We cannot remove AK/Random as there are things that cannot depend on LibCrypto and need a random function.

R-Goc avatar Oct 25 '25 17:10 R-Goc

I explained in the PR and on Discord why I think this is a good change, but as to the point that we lose explicitness, yes, but that doesn't really matter. In fact, part of the point here is to remove the need to make the distinction, to essentially completely eliminate the possibility of using insecure randomness. I've seen this done in other projects that need cryptographic randomness, to just have one RNG function that always uses the CSPRNG, therefore completely eliminating the possibility of accidentally using insecure randomness when secure randomness is needed.

We also are not really losing explicitness given that, as I explained, AK/Random just uses a CSPRNG but with worse error handling. It just creates ambiguity when ultimately the two functions will usually do the same thing, although AK/Random lacks the same correctness checks as SecureRandom. There isn't any practical performance benefit to removing error checking, as checking for an error is probably just a compare and jump-if instruction. If two instructions are going to make the RNG unacceptably slow then it's already not fast enough and probably can't be fast enough.

I spent a while looking into ways to make a faster non-CSPRNG but the only thing I could really think of was getting the time and/or using rand() and possibly feeding it through XorShift, but I am not convinced these are faster given that all modern operating systems have very fast sources of randomness (TPM-provided TRNG, hard disk seek times, input from USB devices and similar peripherals, CPU timing jitter, RDRAND on x86, RNDR or whatever it's called on ARM, etc.) The developer of WireGuard also made many contributions to the Linux CSPRNG system in the past few years further improving both the performance and security of it.

Also, I checked the source code for Firefox and Chromium to see if they have a faster non-CSPRNG, but they both only use the system CSPRNG as a randomness source. (Firefox appears to also have functions to collect entropy from user input and other sources, but I didn't look into it further. At the very least it is unnecessary because the kernel/OS already does that, but also, not nearly as useful given that it might not be able to collect input from all connected peripherals to all applications, given the existence of sandboxing measures on some systems, and could possibly require running as root/administrator which would create a lot of unnecessary attack surface.)

If there is a system that can't quickly produce entropy with any of these sources and/or is slow at applying the mixing algorithm (ChaCha on Linux, AES on Windows) then it would almost certainly be too slow to run a web browser. In fact, such a system would likely be too slow to run any modern operating system to begin with.

And, as I explained in the PR, OpenSSL just uses the system CSPRNG and feeds it into a cipher, and while they used to have other entropy sources, they were also either removed or disabled due to being insecure and/or breaking other things. So, this gets the same amount of entropy as OpenSSL, but with less overhead, and therefore it would improve performance.

I think we should keep a dedicated CSPRNG in LibCrypto and potentially remove AK/Random until a high-performance PRNG is actually needed.

I was going to do this until I realized that the RNG function is used in one of the AK/ files, and I don't believe that the libraries can be used in AK, as I found doing so required some changes to the build system and also isn't done anywhere else. I didn't see any point in arguing about or investigating the norms/convention so I just kept AK/Random.

Of course, we could rename AK/Random to AK/SecureRandom and so on, but I just feel that it would be unnecessary.

You'd have two different classes for two different purposes, regardless of their current implementation

I don't really understand how this could apply to what I've submitted unless there's other RNG classes besides AK/Random and LibCrypto/SecureRandom. Removing SecureRandom and making AK/Random always use CSPRNG values specifically solves the problem of having two classes that do the same thing.

The discussion on Discord was much longer and received more opposition than I expected before it seemed to reach the conclusion that there's nothing wrong with the change but I'd probably have to submit a PR for it myself as it's just a change to slightly tidy up the code, and so I did submit a PR. Much of the original argument seemed to be from a misunderstanding of either what I wanted to change and/or what AK/Random does, and I apologize for that as I am not always great at communicating my intentions.

Ultimately this is just a small change to keep the code tidy and you seem to misunderstand the purpose and practical effects of it.

colleirose avatar Oct 25 '25 22:10 colleirose

I don't find the loss of explicitness very tidy, that's all. If someone is skimming the code, it's much easier to see that it's using a dedicated CSPRNG rather than having to dig into the implementation. I also find the ifdefs hard to read, but I understand there will be some of that if we want to use OS APIs directly.

We cannot remove AK/Random as there are things that cannot depend on LibCrypto

What things use random but cannot depend on LibCrypto? Just IDAllocator in AK?

rgret-dev avatar Oct 26 '25 18:10 rgret-dev

I don't find the loss of explicitness very tidy, that's all.

It returns a random number that's suitable for all use cases. I just don't see there being an issue. This completely eliminates the need for a developer to decide which to use and therefore prevents someone from using insecure randomness when security is needed. Usually it's a good idea to provide high-level abstractions for cryptographic features so that developers don't need to worry about making mistakes. Someone appears to have already made this type of mistake, as the insecure RNG is being used for allocation in BlockAllocator::allocate_block to "reduce predictability".

The Go programming language also did something similar, where they made their standard randomness function cryptographically secure.

If someone is skimming the code, it's much easier to see that it's using a dedicated CSPRNG rather than having to dig into the implementation.

And yet the current implementation is confusing because we see that there are two random functions, and when we check the functions, they do the same thing. If you seriously think this is a problem, I can make a SecureRandom and InsecureRandom file, but I don't know what you hope to change in the InsecureRandom file. Or I can just name AK/Random to AK/SecureRandom and so on, but this would be confusing when tests and the like need to use the RNG because secure randomness wouldn't be needed there. Your original request was to remove AK/Random but that cannot be done.

The best I can think of would be to name the randomness function something like crypto_get_random or whatever, but I still don't know if that's necessary. If someone actually needs to understand what the RNG does for security or whatever other reasons, they should just read the file. I don't know why anyone would be writing something that needs security and not bother to investigate what the function they're using actually does.

I can, of course, look into making a fast non-cryptographic PRNG for something like Math.random(), but this is just another thing that can break and has to be maintained and I really don't think it could improve performance. The best I could think of is creating a randomness pool at application launch using the SecureRandom function and then feeding it through XorShift and reseeding as needed, but IIRC something like this has led to browser fingerprinting in the past. This would also require maintaining a fairly large pool for it to get better throughput and latency than the CSPRNG function, which would massively increase memory usage. (Unless we feed the pool with repetitive data like the current time and then compress it, therefore increasing CPU usage, etc.)

I also find the ifdefs hard to read, but I understand there will be some of that if we want to use OS APIs directly.

Those ifdef statements are already there. I strongly encourage you to read the current AK/Random file. For reasons I've already explained we need an RNG in AK/. I just slightly modified the existing file to add more clear comments, improve error handling, and prevent signal interrupts from causing issues with the RNG on Linux.

If you want to avoid this by using a library or something (there is no other way to obtain secure randomness without the ifdef statements), I already explained that RAND_bytes has unnecessary complexity that makes it slower, and I was told that we don't want to have an OpenSSL dependency in AK.

Alternatively we can just use libsodium's RNG function, which does essentially the same thing as this PR does, although this would again require adding a dependency in AK. I just don't think it's necessary.

What things use random but cannot depend on LibCrypto? Just IDAllocator in AK?

Yes. There may be something else I'm not aware of but that was the one I ran into.

colleirose avatar Oct 26 '25 18:10 colleirose

I'm not contesting that AK/Random currently has ifdef-soup or that its implementation uses CSPRNGs already.

My final thoughts are:

  1. We keep SecureRandom in LibCrypto (and implement any OS-specific things in there)
  2. Work towards removing AK/Random (I'm experimenting with this locally at the moment)
  3. Anything that needs a high performance PRNG can introduce a dedicated class with the name of that generator (e.g. XorshiftGenerator or something) in the future

Again, non-maintainer take so feel free to ignore and let maintainers decide but that is my opinion.

rgret-dev avatar Oct 26 '25 19:10 rgret-dev

Also wanted to respond to what Jelle said:

drop RAND_bytes() in favor of a static_assert. I'd be interested to see if there are platforms out there where there's no CSPRNG API available

OpenSSL will return an error value that we currently VERIFY() on if entropy source fails or is not available, see https://docs.openssl.org/master/man3/RAND_bytes/#notes

With that, I think we should keep OpenSSL as a fallback (which would be impossible if we moved this to AK) so that other, less used platforms can generate secure random values where we lack OS-specific calls.

rgret-dev avatar Oct 26 '25 19:10 rgret-dev

Is anyone actually confident in what method openssl even uses for the rand_bytes? I see they seed it with system randomness or other things if that fails but I can't actually tell what is being used. The whole thing is a mess of engine ifdefs, dynamically getting a method, and the methods being on a ctx stuct which is hard to read without an LSP. The docs say the default is a drbg as described in NIST SP 800-90A Rev. 1. That means it is an AES-CTR prng. Reading into the code for the aes rng the code there is basically a reference implementation and is completely unoptimized. So it is very possible the OS is doing something far faster than that. The thing is, with randomness you should have a secure default. If the OS csprng is fast enough, I see no reason at all to use it.

R-Goc avatar Oct 26 '25 22:10 R-Goc

Is anyone actually confident in what method openssl even uses for the rand_bytes?

Well, I'm not. I just gave my general understanding of it, but it is very unclear to me.

I see they seed it with system randomness or other things if that fails but I can't actually tell what is being used.

Just to quote https://googleprojectzero.blogspot.com/2016/11/breaking-chain.html?m=1: "There was some OpenSSL code in Flash which uses the desktop window as a source of entropy, but as this could never work in the Chrome sandbox it’s clear that this was vestigial (or Flash’s SSL random number generator is broken, chose one or the other)."

That RNG got removed (it was from the OpenSSL upstream source) but these things are why almost everything besides system entropy was removed from the OpenSSL as an entropy source.

The whole thing is a mess of engine ifdefs, dynamically getting a method, and the methods being on a ctx stuct which is hard to read without an LSP. The docs say the default is a drbg as described in NIST SP 800-90A Rev. 1. That means it is an AES-CTR prng.

I've been looking into this, and, while I know very little about this, I suspect this may be because of FIPS compliance, as aws-lc and BoringSSL both seem to have mostly gotten rid of the other sources but still use the same cipher, and Google says this is needed for FIPS compliance.

In aws-lc, it is just AES-CTR(system RNG), while in BoringSSL, they do AES-CTR(RDRAND) on Intel CPUs and AES-CTR(system RNG) on other CPUs, because apparently they say RDRAND is fast on Intel CPUs.

We do not need to worry about FIPS compliance. However, if there is a desire for further optimization, we could do something like putting RDRAND and maybe some other entropy source through ChaCha8 or ChaCha12. Of course, RDRAND instructions have been defective before and paranoid users might be suspicious that the NSA has a secret RDRAND exploit.

Reading into the code for the aes rng the code there is basically a reference implementation and is completely unoptimized. So it is very possible the OS is doing something far faster than that.

I agree and that's why I believe this PR is also a performance optimization.

The thing is, with randomness you should have a secure default. If the OS csprng is fast enough, I see no reason at all to use it.

Yes, that's part of the point I tried to make above. A secure default is better IMO.


Also, I will hopefully fix the issues you mentioned later today. I have some code that compiles and works on my Linux machine fine but VirtualBox and VMWare are both being buggy and as I have other things to do it might take a while before I can finish debugging them.

colleirose avatar Oct 26 '25 22:10 colleirose

We keep SecureRandom in LibCrypto (and implement any OS-specific things in there)

That's fine with me as long as the RNG function won't be needed in areas that can't use LibCrypto. That concern is why I kept it in AK/ instead.

Work towards removing AK/Random (I'm experimenting with this locally at the moment)

Well, let me know if you figure it out. Doing so would require either removing IDAllocator's dependency on Random.cpp, which I don't believe is practical (though perhaps I'm wrong, I only very briefly looked at the file), or allow using LibCrypto in AK/ which I don't believe is going to happen given it contradicts how the existing architecture has been designed. Regardless, imposing a constraint that we can't use the RNG function in AK/ sounds pointless.

Anything that needs a high performance PRNG can introduce a dedicated class with the name of that generator (e.g. XorshiftGenerator or something) in the future

That's fine but I think you're missing the point. Either we keep it named something like get_random and lose explicitness/clarity or name it something like get_secure_random which could lead to some confusion/uncertainty when someone is working on a file that actually doesn't need secure randomness.

I believe that the best way to address your concern would be naming the RNG function as something like crypto_get_random, crypto_random, crypto_rng, etc. so that it's less ambiguous that it's a secure RNG but wouldn't seem as strange when used somewhere that doesn't need secure randomness, although of course this is subjective and you may think it still would sound the same.

Like R-Goc said: "The thing is, with randomness you should have a secure default. If the OS csprng is fast enough, I see no reason at all to use it." And while I know you're not disputing that the operating system CSPRNG is fast enough, I think it seems to be acceptable to use here as a general-purpose secure default.

With that, I think we should keep OpenSSL as a fallback (which would be impossible if we moved this to AK) so that other, less used platforms can generate secure random values where we lack OS-specific calls.

The problem with this is that 1) RAND_bytes is slow and inefficient (see above discussion), and 2) I don't believe there are any targets where that fallback would be needed. If someone wants to add a new target they would need to update AK/Random to use its native CSPRNG, which IMO is fairly reasonable.

colleirose avatar Oct 26 '25 22:10 colleirose

I ran a benchmark on windows of the current fill_with_random vs fill_with_secure random implementation. The test was 20 runs of filling a total of 1GB of data, at different chunk sizes. For each chunk size there was a warmup of 10 runs of both interleaved. (I can share the code, but it is quite ugly and non-portable, due to an issue with AK::MonotonicTime on windows I had to use the performace counter directly). https://gist.github.com/R-Goc/cfc91fb86aa21332576844118866174e

R-Goc avatar Oct 27 '25 17:10 R-Goc

My assumption is that OpenSSL maintaining its own entropy pool is faster for larger accesses as getting data from the RAM and CPU cache in that case is faster than the system call, but the system call is faster otherwise.

I guess the most efficient way to do this would be to maintain an entropy pool for large sizes that's mixed with a faster CSPRNG like ChaCha8 or ChaCha12, using it for large allocations and using the syscall (or maybe RDRAND?) otherwise.

I'll spend some time looking into this I suppose

colleirose avatar Oct 27 '25 20:10 colleirose

Still we are multiple GB/s. So more than enough.

R-Goc avatar Oct 27 '25 20:10 R-Goc

I think what's most relevant would be latency for things like Math.random(), where we don't need very high throughput but a fast response could be useful. I'll need to look into it more later.

I am pretty sure I could do it but it doesn't seem to matter too much. Maybe in the future tho, I just don't have enough time right now.

colleirose avatar Oct 27 '25 22:10 colleirose

Your pull request has conflicts that need to be resolved before it can be reviewed and merged. Make sure to rebase your branch on top of the latest master.

github-actions[bot] avatar Oct 28 '25 10:10 github-actions[bot]

Fixed the merge conflicts and the lint issue. Can confirm it's working on Linux, but my VM is still very laggy and so I can't test Windows myself yet though I'll keep trying what I can.

colleirose avatar Oct 28 '25 20:10 colleirose

I had attempted to rename the branch... whatever.

colleirose avatar Oct 28 '25 20:10 colleirose

Your pull request has conflicts that need to be resolved before it can be reviewed and merged. Make sure to rebase your branch on top of the latest master.

github-actions[bot] avatar Dec 01 '25 11:12 github-actions[bot]

Your pull request has conflicts that need to be resolved before it can be reviewed and merged. Make sure to rebase your branch on top of the latest master.

I already have a very simple fix for the merge conflict but I'm going to test everything locally again before committing it just to be sure. I'm a bit busy right now so this should be done later today hopefully.

colleirose avatar Dec 01 '25 21:12 colleirose

Merge conflict is fixed and I've verified that everything works correctly

colleirose avatar Dec 02 '25 01:12 colleirose

Your pull request has conflicts that need to be resolved before it can be reviewed and merged. Make sure to rebase your branch on top of the latest master.

github-actions[bot] avatar Dec 10 '25 20:12 github-actions[bot]