cats-effect
cats-effect copied to clipboard
Make `UUIDGen` more performant
As of https://github.com/typelevel/cats-effect/pull/2880 it delegates to Sync#blocking.
Some ideas in https://github.com/typelevel/cats-effect/issues/2865#issuecomment-1062567227 to improve the status quo:
While
UUID.randomUUIDis blocking, we can do better:
- Try to create a
SecureRandom.getInstance("NativePRNGNonBlocking")."NativePRNG"appears to be equivalent onnextByteson Unix, but that's platform specific.- Until Java 9, there's still a mutex, but a pool fixes that.
http4s/http4s#6138 is another hidden instance of this problem. Maybe what we really need is a way of safely and efficiently generating random bytes. UUIDGen, and this http4s use case, could both build on that.
Yes, I've wondered if we need a SecureRandom[F] type class that extends Random[F]. But btw, at what point does this start scope-creeping outside of CE3? If we get a SecureRandom it will need a JS implementation, which gets into bobcats territory with different implementations for Node.js and browsers and a CI matrix spanning all of them.
Ross has demoed his proposal in https://github.com/http4s/http4s/pull/6165.
I rather like Ross' proposal.
I reformed it a bit to add the pool. This is a win when we get caught in a mutex, which we always will in Java 8. In Java 9, if we have a non-blocking but threadsafe instance, we probably don't want that pool. But detecting thread safety is super gross.
Would SecureRandom have any additional methods, or just a marker trait that it's, like, secure and stuff?
I can imagine a marker trait being useful, so a random client can express that it needs something stronger.
I was thinking marker trait. I see the JDK SecureRandom adds a couple additional methods like getAlgorithm, no idea how useful those are.
https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/security/SecureRandom.html
I have found getAlgorithm nice for debugging. The next would be great, because I already needed bits instead of bytes, and had to go from bytes back to bits. But Java's not going to give us that.
Btw, another good reason to implement UUIDGen in terms of our own SecureRandom is that at least on Scala.js, the UUID.randomUUID() method is not cryptographically secure. Actually that seems like a pretty serious issue...
https://github.com/scala-js/scala-js/blob/058532aa8c504b76431b40e3e1b51b2cdef87643/javalib/src/main/scala/java/util/UUID.scala#L139
Btw, another good reason to implement UUIDGen in terms of our own SecureRandom is that at least on Scala.js, the UUID.randomUUID() method is not cryptographically secure. Actually that seems like a pretty serious issue...
Uh yes, that's a relatively serious issue.
Hey!
I want to take this ticket, I'm just wondering what would be the final solution here. Should we have something like this?
SecureRandom[F].nextBytes(16)
Using our own implementation of SecureRandom and them reimplement the steps done on UUID.randomUUID() ?
public static UUID randomUUID() {
SecureRandom ng = Holder.numberGenerator;
byte[] randomBytes = new byte[16];
ng.nextBytes(randomBytes);
randomBytes[6] &= 0x0f; /* clear version */
randomBytes[6] |= 0x40; /* set to version 4 */
randomBytes[8] &= 0x3f; /* clear variant */
randomBytes[8] |= 0x80; /* set to IETF variant */
return new UUID(randomBytes);
}
Hi @diogocanut, great! Good question, I think there are two changes we need to make:
-
Exactly like you say, implement our own
UUIDGen[F]usingSecureRandom[F]. -
Optimize
SecureRandom[F]itself. Currently it is blocking which is bad for performance, but it doesn't have to be. In this PR Ross demonstrates how we can attempt to acquire a non-blocking secure random implementation. We should port that strategy to Cats Effect. https://github.com/http4s/http4s/pull/6165
Hm, I'm wondering if the NativePRNGNonBlocking part is still relevant? I'm trying to find some up to date info on it, and it seems to be solved even on Linux: https://lwn.net/Articles/884875/.
(Of course the other thing, JVM 8 using synchronized is certainly relevant on JVM 8.)
Uhh, NativePRNGNonBlocking also seems to have a lock...
See also https://bugs.openjdk.org/browse/JDK-8278371.
Here's a potentially mad idea ... could we just implement our own secure random by reading from /dev/urandom? It's an ordinary file so we shouldn't need any JNI or native calls to use it. And even if we did still need some kind of locking, at least we could do it with Mutex so it's only fiber-blocking.
I thought about that, but it seems like a can of worms (although if someone wants to open it, I definitely won't stop them):
- What to do on exotic platforms, where there is no
/dev/urandom(like... Windows)? - I believe
/dev/urandomstill happily returns non-random data shortly after boot (under certain circumstances). I'm not sure how "shortly" that happens (e.g., is it observable from a JVM at all), but it seems somewhat scary.- Btw, I have the same concern about
NativePRNGNonBlocking.
- Btw, I have the same concern about
- As far as I know, the alternative
/dev/randomis fixed on modern Linux, but on older versions it blocks when it feels like it.
- What to do on exotic platforms
Fallback to the current scheme: use JDK SecureRandom with blocking(...)
- I believe
/dev/urandomstill happily returns non-random data shortly
Yeah ... and I think you are right that NativePRNGNonBlocking would have similar issue.
- As far as I know, the alternative
/dev/randomis fixed on modern Linux, but on older versions it blocks when it feels like it.
Yeah, this seems the most promising avenue, based on the article you linked.
In light of above, my best ideas are:
-
check for modern linux and use
/dev/random, otherwise fallback toblockingwith JDKSecureRandom -
give up and just use
blockingwith JDKSecureRandom🤷
Oh, right, here it's possible to just fall back to blocking... that sounds good to me. A possible (but somewhat convoluted, maybe even fragile) third option:
- When creating the
SecureRandom, a. read a little data from/dev/random(inblocking): this will make sure the kernel RNG is initialized b. write this data into/dev/urandom(indelay): this will make sure thaturandomis also initialized, even if it's using a different RNG from/dev/random(I think older Linux might do this?) c. read from/dev/urandomfornextBytesand similar (indelay): it should be safe after b. d. (if any of a./b. fails, fall back to aSecureRandomwithblocking)
I think this option has the advantage of also working on older Linux.
(Fallback could be smarter, e.g., detect if we have a sufficiently modern OS that the dance with reading-writing is not necessary, ...)
Also, there is a JDK built-in SecureRandom called Windows-PRNG, which uses a Windows API call to get random numbers. Which sounds good, but I could not find any info on whether that call is blocking.