fix: use portable C++ RNG
Motivation
Fixes https://github.com/NixOS/nix/issues/10541
Context
I took inspiration from "A Tour of C++, Third Edition" (ISBN-10 0136816487) to implement the Random class. Making it a bit easier for users to generate random numbers using C++'s portable RNG. The use of std::random_device might be a bit overkill, and unnecessary, but I was having fun. 😅
I'm also aware that this constantly creates instances of Random, and then instantly throws them away again. I thought about making this a singleton in some way, but then again, maybe the compiler is smart enough about this.
There are other distribution mechanisms that we could use, I just went with std::uniform_int_distribution, since that's what the book uses.
Priorities and Process
Add :+1: to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.
@bryanhonof can you try the thing I suggested in the issue of making IndirectRootStore own the random number generator?
Also, can you remove the srand and srandom we don't need anymore?
@bryanhonof can you try the thing I suggested in the issue of making IndirectRootStore own the random number generator?
Sure, although I didn't quite understand why it'd need to be part of that t.b.h. I liked the way you could use this implementation as a functor, and have the constructor define the range, giving people working on the Nix codebase a nice way to generate random numbers in the future, without doing the whole init dance. We'll lose that behavior if I'm going to instantiate it in IndirectRootStore, I think.
Also, can you remove the srand and srandom we don't need anymore?
I believe I already did, rg -- 'srand\(' returned nothing for me from the root of the project.
Added a generic Random as a public member. lmkwyt.
Should I also replace the code in src/libstore/filetransfer.cc with a Random?
https://github.com/NixOS/nix/blob/96ba7f9d77d6f2fd8fd64aafc50dd8c850e8a902/src/libstore/filetransfer.cc#L45-L46
@bryanhonof
Sure, although I didn't quite understand why it'd need to be part of that t.b.h. I liked the way you could use this implementation as a functor, and have the constructor define the range, giving people working on the Nix codebase a nice way to generate random numbers in the future, without doing the whole init dance. We'll lose that behavior if I'm going to instantiate it in IndirectRootStore, I think.
I think it is good if the call-site specifies the distribution, the state is just to avoid consulting the somewhat icky device random global variable. Feel free to change Random to separate the "storing the seed" part, from the "choosing what distribution to sample from" part.
@bryanhonof Sure feel free to change FileTransfer
I guess getFileTransfer and makeFileTransfer would need to be changed to pass in the seed?
I don't think that using std::random_device is a good choice. Its behavior is basically undefined:
std::random_device may be implemented in terms of an implementation-defined pseudo-random number engine if a non-deterministic source (e.g. a hardware device) is not available to the implementation. In this case each std::random_device object may generate the same number sequence.
The question is: Which properties shall this RNG have? For testing a PRNG is useful, i.e. the same seed shall generate the same sequence of numbers. If the generator is non-deterministic, then this is not possible.
Is anything else than uniformly distributed numbers needed?
Please stay away from Mersenne-Twister, especially std::mt19937, because it's huge and slow. If not seeded correctly, then it will output a long sequence of "bad" random numbers.
@NaN-git
I don't think that using std::random_device is a good choice. Its behavior is basically undefined:
I did see that comment, I honestly wouldn't know what else to use. Do you maybe think having a second constructor that accepts a std::seed_seq might be better? Or perhaps even a plain int, and use that in combination with some time mechanism?
The question is: Which properties shall this RNG have? For testing a PRNG is useful, i.e. the same seed shall generate the same sequence of numbers. If the generator is non-deterministic, then this is not possible.
I'm honestly not sure, might be that @Ericson2314 knows?
But, when it comes to testing, that class has the seed() member function. So that we can just set the seed to something we know.
Is anything else than uniformly distributed numbers needed?
As far as I can see, no. I don't expect to support generating random characters any time soon. Maybe floats or doubles.
Please stay away from Mersenne-Twister, especially std::mt19937, because it's huge and slow. If not seeded correctly, then it will output a long sequence of "bad" random numbers.
I've used std::default_random_engine, which seems to default to std::minstd_rand, at least on LLVM. The code in src/libstore/filetransfer.cc does use std::mt19937, but I'm trying to replace that with just the Random class.
If it uses std::mt19937 today, I think it could be fine to keep it that way (but also fine to change it). I care more about avoiding global variables for seeding than sampling at this point, since the rand/random baseline is also somewhat vaguely defined.
I'm going to push a commit with the things I want.
@bryanhonof I am afraid I am a bit back to the drawing board now that I don't see any PRNG "splitting" of the sort mentioned in https://www.tweag.io/blog/2020-06-29-prng-test/ in the C++ standard library.
There's a related Lix commit that may fix #7273 that you may wish to backport instead
https://gerrit.lix.systems/c/lix/+/2100