Linux random number generator updates
This includes the POSIX interface getentropy, that is simpler to use than getrandom, and in practice it is available for as long as getrandom is available in glibc, in addition to being part of OpenBSD before that. https://pubs.opengroup.org/onlinepubs/9799919799/
This patch set also removes the long discussion about /dev/random and /dev/urandom which I loved, but today these interfaces function similarly. https://github.com/torvalds/linux/commit/30c08efec8884fb106b8e57094baa51bb4c44e32
Any update on the merging status of this? The intent of the proposal is to get this section to apply to modern kernels and reduce the semantic complexity discussed. I find the value of giving a simple story of the current semantics of the devices significant, as a newcomer to Linux will not have to increase their limited mental load with details that are only of historical interest. The only "change" is the mentioning of getentropy because and correct me if I'm wrong it serves the same simplification.
I verified the POSIX inclusion: https://pubs.opengroup.org/onlinepubs/9799919799/functions/getentropy.html
The Linux kernel text looks correct to me. I've asked Greg KH to verify the Linux kernel situation. Yes, I should have done that a while ago, but I'm on the case now :-).
Greg KH asked me to look at the docs & talk directly with the kernel implementers. First, looking at available docs...
POSIX does have getentropy and looks great. It's not just paperware, it's clearly documented in the Linux man page on getentropy as of glibc 2.25 (as well as OpenBSD). We generally refer to standardized interfaces (as long as they're actually available) so that looks good.
The /dev stuff needs more research. It's true that commit https://github.com/torvalds/linux/commit/30c08efec8884fb106b8e57094baa51bb4c44e32 from 2020 was earlier merged in. However, there were problems in the unification noted in 2022. So I want to check and make sure that these statements about /dev/*random are correct.
I believe the people I eventually need to contact about Linux kernel random number generation are Jason A. Donenfeld and Theodore Ts'o. Jason seems to be more active recently on it, and tytso has a long background on it.
The most authoritative source is the code itself. I've identified the key source file to be reviewed: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/char/random.c?h=v6.12.4
The problem is that /dev/random blocks. In many cases blocking is a much worse problem. See: “Myths about /dev/urandom”
The urandom(4) man page says:
When read, the /dev/random device will only return random bytes within the estimated number of bits of noise in the entropy pool. /dev/random should be suitable for uses that need very high quality randomness such as one-time pad or key generation. When the entropy pool is empty, reads from /dev/random will block until additional environmental noise is gathered. A read from the /dev/urandom device will not block waiting for more entropy. As a result, if there is not sufficient entropy in the entropy pool, the returned values are theoretically vulnerable to a cryptographic attack on the algorithms used by the driver. Knowledge of how to do this is not available in the current unclassified literature... If you are unsure about whether you should use /dev/random or /dev/urandom, then probably you want to use the latter. As a general rule, /dev/urandom should be used for everything except long-lived GPG/SSL/SSH keys. ... Users should be very economical in the amount of seed material that they read from /dev/urandom (and /dev/random)...
The same problem hits getentropy, at least on Linux, because it also blocks. Per the getentropy Linux man page:
A call to getentropy() may block if the system has just booted and the kernel has not yet collected enough randomness to initialize the entropy pool. In this case, getentropy() will keep blocking even if a signal is handled, and will return only once the entropy pool has been initialized.
Randomness is quite technical and challenging. I've proposed some alternative text, building on what you proposed earlier. Once we agree on something, I intend to bring it to the Linux kernel developers who focus on the random number generator, to get their take.
Security is confidentiality, integrity, and availability. If a system can't start, nobody cares about it. So we need to make it clear that there's a trade-off. What's more, in a lot of cases the secrets are short-lived (e.g., session keys) and the protections provided by entropy estimates are usually not worth making the system completely fail.
Randomness is quite technical and challenging. I've proposed some alternative text, building on what you proposed earlier. Once we agree on something, I intend to bring it to the Linux kernel developers who focus on the random number generator, to get their take.
Security is confidentiality, integrity, and availability. If a system can't start, nobody cares about it. So we need to make it clear that there's a trade-off. What's more, in a lot of cases the secrets are short-lived (e.g., session keys) and the protections provided by entropy estimates are usually not worth making the system completely fail.
I see the availability aspect important to you, and I see rightfully, but this introduces additional complexity that the one reading the guidance would have to manage. The key aspects that led me to simplify the guidance initially to an unconditional endorsement of getentropy and thus towards simplicity:
- During my early times at red hat we had identified certain VMs with sometimes identical ssh private keys. That was at the time where /dev/urandom was the only practical choice to use. That led me to believe that an uninitialized random generator is something that is very very hard to use in a reasonable way (not just securely).
- The initialized or not is a very detailed nuance the very few people have the ability to really understand and even fewer can claim they have securely used an uninitialized random generator (should they wait until there are 4 bits, 64 bits or were some bits fixed instead of random, and how to do all that). Given that the intention of a kernel-provided random generator is to seed other random generators that in turn can be used for any number of reasons including generating long-term keys, the use-uninitialized option stands on a very shaky foundation.
As such, I was led to believe that the only practical recommendation for the average developer is to avoid all nuances and details and use getentropy will always be the right one for the type of random generators that are intended to seed others.
At the same time, there is the issue of availability that is important, but at the same time I do not feel that getrandom() with the flag to not block is the unconditional answer, as in the case of openssh keys for example it will produce output that is predictable and with a long-lasting effect. These keys will be predictable even years after they have been generated.
The availability issue is something that could happen, probably when the system after a boot hasn't gathered enough entropy (could be in custom designed boards or in VMs as in the example I used above). In these cases using getrandom() with the option to not-block will hide the issue during the design phase and prevent a fix (e.g., using something like virtio_rng on a VM or a hardware random generator on a board). During the production phase it can prevent the issue of (un)availability.
As such, my recommendation would be to cover the potential availability issue, maybe with a pointers as after writing the above I'm not sure that getrandom() and /dev/urandom are even solutions (as opposed to quick hacks) to the availability issue for that type of random generator.
PS. this is more expressing my point of view rather than a text suggestion. If you also see value on these aspects above and we get on the same page I can propose something.
We're talking about "no/low estimated entropy" which isn't quite the same as "uninitialized" in its usual sense. In many cases the kernel doesn't normally credit entropy, and believes it's 0 or low, yet there's no known way for an attacker to predict the random value. In particular, "Writing to /dev/random or /dev/urandom will update the entropy pool with the data written, but this will not result in a higher entropy count." per urandom. This is a common case; distros generally write out the seed on shutdown, and read it back, so in cases where it can "start where it left off" this matters. It's really common for systems to reload the seed they previously had on shutdown, where it has a seed no one can know yet its estimated entropy is 0.
During my early times at red hat we had identified certain VMs with sometimes identical ssh private keys. That was at the time where /dev/urandom was the only practical choice to use.
Obviously that's bad :-). We already cover that, though. SSH keys are cases where we already recommend the use of blocking calls, like /dev/random.
I agree that this is tricky :-). We have a fundamental trade-off. I think we should clearly present it as a trade-off. If a system is unavailable that's often a non-starter. One of the most authoritative guidance about this trade-off when using Linux is the text from urandom(4):
If you are unsure about whether you should use /dev/random or /dev/urandom, then probably you want to use the latter. As a general rule, /dev/urandom should be used for everything except long-lived GPG/SSL/SSH keys. If a seed file is saved across reboots as recommended below (all major Linux distributions have done this since 2000 at least), the output is cryptographically secure against attackers without local root access as soon as it is reloaded in the boot sequence, and perfectly adequate for network encryption session keys. Since reads from /dev/random may block, users will usually want to open it in nonblocking mode (or perform a read with timeout), and provide some sort of user notification if the desired entropy is not immediately available.
I guess one option is to directly quote this as part of the text.
We're talking about "no/low estimated entropy" which isn't quite the same as "uninitialized" in its usual sense. In many cases the kernel doesn't normally credit entropy, and believes it's 0 or low, yet there's no known way for an attacker to predict the random value.
That's certainly correct and since you have checked the internals recently have a better overview than I had. However what is my concern is whether there a distinction between the uninitialized rng and the estimated as zero entropy level rng for the linux kernel? Without a distinction it seems to me very hard to define the expectation of what the output will be at this early boot stage.
(and just to make it clear that it is not my position that the /dev/random behavior was good when it had the notion of entropy loss and could block at any time causing effectively a DoS; my focus is on the current behavior where the rng will only unblock when it considers itself initialized irrespective of whether I agree with the criteria chosen)
In particular, "Writing to /dev/random or /dev/urandom will update the entropy pool with the data written, but this will not result in a higher entropy count." per urandom. This is a common case; distros generally write out the seed on shutdown, and read it back, so in cases where it can "start where it left off" this matters. It's really common for systems to reload the seed they previously had on shutdown, where it has a seed no one can know yet its estimated entropy is 0.
Let's not forget that this is an assumption that this happens (being in the embedded world right now changed a little my perspective). Maybe that's a good point to include in the text.
Obviously that's bad :-). We already cover that, though. SSH keys are cases where we already recommend the use of blocking calls, like /dev/random.
I agree that this is tricky :-). We have a fundamental trade-off. I think we should clearly present it as a trade-off. If a system is unavailable that's often a non-starter. One of the most authoritative guidance about this trade-off when using Linux is the text from urandom(4):
If you are unsure about whether you should use /dev/random or /dev/urandom, then probably you want to use the latter. As a general rule, /dev/urandom should be used for everything except long-lived GPG/SSL/SSH keys. If a seed file is saved across reboots as recommended below (all major Linux distributions have done this since 2000 at least), the output is cryptographically secure against attackers without local root access as soon as it is reloaded in the boot sequence, and perfectly adequate for network encryption session keys. Since reads from /dev/random may block, users will usually want to open it in nonblocking mode (or perform a read with timeout), and provide some sort of user notification if the desired entropy is not immediately available. I guess one option is to directly quote this as part of the text.
To my understanding you suggest to split the recommendation based on the use cases such as:
- long term effects such as keys (or any --unknown at the time-- purpose)
- short term effects such as short-term keys
Thinking about it, this separation is not enough. On the second case we only have an assumption that a short term effect will come from using that short-term key. If this short term key is used to transport the "crown jewels", the assumption of suitability goes away --again thinking that there is no strict definition of the unblocked behavior strength. It feels that any distinction between blocking and non-blocking behavior usage must come with very concrete assumptions that will in turn limit the usefulness of the advice.
And let me propose some text to see how a recommendation based on use case can look:
## Recommendations based on use case
### 1. Seeding a random generator for any use including long-term key generation
- Use the `getentropy` or if unavailable (e.g., in shell scripts), prefer `/dev/random`
### 2. Seeding a random generator for purposes with no long-term effects
- For tasks where cryptographic security is required but availability during the early boot stage trumps security and there is no long-term effect from the use of the seed also consider `/dev/urandom` or getrandom with the non-blocking parameter.
NOTE: When designing a new system based on the Linux kernel, before considering the non-blocking variants ensure that the boot process provide enough entropy during the boot process by identifying the hardware entropy sources and ensuring they contribute to kernel entropy pool.
The second would have to have an example, but I'm not sure I can find a useful one. I hope that this demostrates a little better why I think that the usefulness of the advice no 2 is limited.
Part 1: I agree that using blocking calls for long-term keys are the general consensus. However, getentropy is a relatively new call. On Linux systems /dev/random and getrandom are available more often. There's no strong reason to avoid /dev/random on Linux systems where available, though, it gets the same thing. The only reason would be a container setup that didn't create the device for some reason, which would be a pretty odd setup.
Part 2: The "early boot" isn't quite right. The Linux kernel developers are pretty conservative about what counts as entropy - and understandably so. As a result, the kernel can report "0 entropy" when in fact there's no way an attacker can determine the seed value (so it's not REALLY 0 entropy). It's not clear how to fix this; the kernel devs only have so much information and it is reasonable for them to be conservative. However, this also means that many systems hang if you force them to use blocking calls. It's frustrating. I'd love to say "always use blocking calls for cryptographic randomness" - it is much easier to understand - but the real world makes that impractical to simply require.
Quick note: It may not look it, but I really appreciate this discussion & deep dive into random number generation. It seems simple at first, but this is an area that is absolutely fraught with complications. Many successful attacks have involved the cryptographic random number generator, so it matters. On the other hand, making entire systems fail because they think they don't have enough entropy is a great way to ensure that the system, and any thought of security, is removed immediately. Threading these issues is complex.
Quick note: It may not look it, but I really appreciate this discussion & deep dive into random number generation. It seems simple at first, but this is an area that is absolutely fraught with complications. Many successful attacks have involved the cryptographic random number generator, so it matters. On the other hand, making entire systems fail because they think they don't have enough entropy is a great way to ensure that the system, and any thought of security, is removed immediately. Threading these issues is complex.
Same here, and this is a difficult case. There is a need for balance between being conservative by suggesting blocking and being efficient by non-blocking, while at least I rely more on "gut" feeling rather than clear data. What I miss whether the blocking issue as communicated in myths-about-urandom, is an issue that is still relevant today as when it was written. Your suggestion is a good balance, and I'll update the MR with it. Would it be ok to remove the reference to myths-about-urandom, as the urandom/random environment was significantly different? The myths document provides a very nice historical context for the devices, but to someone learning linux and security today it will be of limited practical value.
I've included the text you suggested without including the reference to myths-about-urandom due to it not being an actionable reference today.
Thanks. I've asked GregKH if he can take a look.
Seems sane to me, nice discussion!
But really you should ask Jason as he's the random maintainer and did all the work in the past few years on that codebase in the kernel.
Fair enough. I've sent a request to Jason Donenfeld to review this proposed change.
I want to be very cautious about changes in this area. Doing it wrong can lead to hard-to-detect nasty vulnerabilities. It can also lead to systems don't work, which isn't exactly a positive outcome :-).
Fair enough. I've sent a request to Jason Donenfeld to review this proposed change.
I want to be very cautious about changes in this area. Doing it wrong can lead to hard-to-detect nasty vulnerabilities. It can also lead to systems don't work, which isn't exactly a positive outcome :-).
I understand your concerns though with your latest proposal that was incorporated, the essence of the text remains identical with the original with addition of getentropy in the discussion, and removing a paragraph that was duplicate. As a recommendation I still view it as err"ing" on the conservative side and warning about indefinite blocking. Removing the warning of indefinite blocking was my original intention, but after our discussion this change is no longer present in the text.
In short, from what I read I see no big change in the text except mentioning getentropy from posix.