RandomLib icon indicating copy to clipboard operation
RandomLib copied to clipboard

Enforcing resulting random string contents

Open tomasfejfar opened this issue 6 years ago • 6 comments

Somewhat typical situation with a password is that it should not be totally random - it should for example contain one number, one lowercase character and one uppercase character. Or it should contain a special character.

Currently the passwords are generated entirely randomly. So let's say I wanted to create a password that would match following regexes: ~[0-9]~, ~[a-f]~. Then with $generator->generateString(12, 'abcdef0123456789'); you'll get a password aaaabbbbcccc or something like that that wouldn't match the expected format as it's missing the numbers. It's certainly possible to test the generated password and try again unless it passes, but that may well lead to approximately endless loop.

Do you think it would be possible to extend the generator to add an option to generate random strings that pass typical requirements (contains numbers, uppercase, lowercase, special char)? Or what approach would you suggest in such scenarios?

I'm worried that userland implementations (like replacing random character with rand(0,9) if number is missing) could mean potential security issues and it's something this library tries to prevent.

tomasfejfar avatar Aug 01 '18 12:08 tomasfejfar

Currently our use case is that we need to generate user's password for Redshift (we need to supply the password when creating the user). The Redshift's password restrictions is not something we can change.

It must be 8 to 64 characters in length. It must contain at least one uppercase letter, one lowercase letter, and one number. It can use any printable ASCII characters (ASCII code 33 to 126) except ' (single quote), " (double quote), :, , /, @, or space.

There is a workaround to supply MD5 hash instead but there are certainly many other cases where there is no workaround.

tomasfejfar avatar Aug 01 '18 13:08 tomasfejfar

It's certainly possible to test the generated password and try again unless it passes, but that may well lead to approximately endless loop.

So, this is how it would be implemented internally should this be done. It's the only way to do certain types of constraints without risking biasing the algorithm. For example, these lines are in the code now (https://github.com/ircmaxell/RandomLib/blob/master/lib/RandomLib/Generator.php#L239-L242):

do {
    $test   = $this->generate($bytes);
    $result = hexdec(bin2hex($test)) & $mask;
} while ($result > $range);

So for your case, I would simply do something like:

do {
    $password = $generator->generateString(12, 'abcdef0123456789');
    $test = preg_match('~[0-9]~', $password);
    $test2 = preg_match('~[a-f]~', $password);
    // etc
} while(!$test || !$test2 /*...etc...*/);

Could it theoretically run in an infinite loop? Absolutely. But the chances of that happening are exceedingly small. For the given situation, each character has a 6/16 chance of being a letter, and a 10/16 chance of being a number. So all numbers would be (10/16)^12 ~= 0.35% chance. All letters would be (6/16)^12 .0007733% chance. For a single run. Since we don't care which fails, we can add them:

chance_of_rehash_for_one_execution = (10/16)^12 + (6/16)^12
chance_of_rehash_for_one_execution = .003560447

So pretty small. But what about 10 executions? Well, we need to raise that answer to the power of 10, since we need all 10 to hit that situation:

chance_of_rehash_for_10_executions = chance_of_rehash_for_one_execution ^ 10
chance_of_rehash_for_10_executions = 3.27374806e-25

Meaning that there's a 0.0...0327% chance of that loop executing 10 times, where there are 22 0's in that.

This is assuming an unbiased generator.

Now, using redshifts example, it needs to take one upper case, one lower case, and one digit, and the search space is 86 characters (assuming you allow all possible characters they allow). So:

chance_of_no_uppercase = (60/86)^size
chance_of_no_lowercase = (60/86)^size
chance_of_no_digit = (76/86)^size

So the overall chance of at least one failing:

chance_of_any_failure = chance_of_no_uppercase + chance_of_no_lowercase + chance_of_no_digit
chance_of_any_failure = (60/86)^size + (60/86)^size + (76/86)^size
chance_of_any_failure = 2*(60/86)^size + (76/86)^size

So for different sized generations:

chance_of_failure_for_8_chars = 48%
chance_of_failure_for_16_chars = 28%

And for 10 "tries" (chance of 1 failure)^number_of_tries:

for_8_chars = 0.0709%
for_16_chars = 0.0000004%

So you can see it gets insanely low incredibly quickly...

So yeah, my suggestion is just to retry in a loop. To code this "pattern" logic in would be hard to generalize short of accepting an array of regular expressions. Though if you want to provide a pull request that looks like it has a reasonably clean API to do that, I'd definitely consider it.

ircmaxell avatar Aug 01 '18 13:08 ircmaxell

Hi, that would be validation after you already created a random password. How about you try to do it before? You have your flags in Generator.php right? You could try and build something like:

$generator->generateString(10, CHAR_ALNUM | CHAR_SYMBOLS);

In here you could create a random string for each flag. Then break each string apart, combine them all together shuffled and return the desired length. Of course this would be the idea behind it. But it would prevent a loop where you generate something, validate your mask (regex or something) and regenerate and so on...

mischabraam avatar Aug 21 '18 09:08 mischabraam

@tomasfejfar you could do it yourself with existing code of this library. I needed this to and solved it like this.

$password =
    $generator->generateString(3, Generator::CHAR_UPPER).
    $generator->generateString(4, Generator::CHAR_LOWER).
    $generator->generateString(4, Generator::CHAR_DIGITS).
    $generator->generateString(2, Generator::CHAR_SYMBOLS);

$characters = array_filter(str_split($password));
shuffle($characters);
$password = implode("", $characters);

mischabraam avatar Aug 21 '18 09:08 mischabraam

Unfortunately that's not random.

The resulting password will always have 3 upper letters, 4 lower, 4 digits and 2 symbols. Also I wouldn't be sure about the randomness of the shuffle method. Because the order of the generated parts is always the same I'd argue that overal this would make the password much more easily predictable.

You could randomly select one of the "parts" and pick one character from it and you could check that each part has at least once been used, but compared to that just generating in loop until you get the correct format (and throw in case you didn't get a valid in 1000 tries or something) seems like a better alternative.

Generally best option would be to have a method that would generate INTs 1-4 times password length. Ensuring that each int 1,2,3,4 is at least present once and then use the respective generator to create that character. But sill - that still depends on the fact that you need to make sure each of the ints is present at least once. And that means still a while loop. So no, still no better then just do the whole thing in while loop.

tomasfejfar avatar Aug 21 '18 11:08 tomasfejfar

There are lots of ways to generate biased passwords (and hence partially predictable, or at best with less "randomness" than it appears on the surface).

The "retry in a loop" is one of the few methods that is seen to "always work" without biasing. There are other ways of doing it without a while loop, but biases are far harder to test for and tell.

Which is why the "retry in loop" paradigm is so heavily used.

ircmaxell avatar Aug 21 '18 12:08 ircmaxell