guava icon indicating copy to clipboard operation
guava copied to clipboard

Random string generation API

Open lvxiang opened this issue 9 years ago • 20 comments

Hi, I’ve searched through Guava API, looking for a random string generator, but found none. And I’ve looked at this thread which suggested using the BaseEncoding in some way for your purposes. I’m not a big fan of this solution. For one thing you have to know BASE64 very well so as not to make any mistakes, such as forgetting to omit paddings. Moreover, it’s not an instant solution as most APIs usually do, you have to think twice to come up with the idea. On the other hand, the solution is too detailed to allow flexibility, consider the following cases:

  • What if some chars are not supposed to appear in the string. I might want to ignore ‘I’ and ‘l’ for they look very alike.
  • There is no way to generate a unicode string.
  • What if I want all letters in their capital forms?

You have to write your own code to cover the cases above. There are other issues to consider:

  • There’s definitely a better solution with better performance than using BaseEncoding.
  • There’s no formal proof to the randomness of strings generated.

Personally, I often come across requirements for random strings in various formats, and I see a good reason for Guava to provide dedicated APIs for generating random strings. The fluid-style can perfectly fit into this situation. Please let me know your thoughts on this issue.

lvxiang avatar Jul 21 '15 14:07 lvxiang

See http://stackoverflow.com/questions/41107/how-to-generate-a-random-alpha-numeric-string

Construct a BigInteger with N bits randomly obtained from a Random source and encode it in Base-32.

Cheers,

Thiago Kronig

On Tue, Jul 21, 2015 at 11:37 AM lvxiang [email protected] wrote:

Hi, I’ve searched through Guava API, looking for a random string generator, but found none. And I’ve looked at this thread http://stackoverflow.com/questions/20782919/does-guava-have-a-method-to-generate-random-strings which suggested using the BaseEncoding in some way for your purposes. I’m not a big fan of this solution. For one thing you have to know BASE64 very well so as not to make any mistakes, such as forgetting to omit paddings. Moreover, it’s not an instant solution as most APIs usually do, you have to think twice to come up with the idea. On the other hand, the solution is too detailed to allow flexibility, consider the following cases:

  • What if some chars are not supposed to appear in the string. I might want to ignore ‘I’ and ‘l’ for they look very alike.
  • There is no way to generate a unicode string.
  • What if I want all letters in their capital forms?

You have to write your own code to cover the cases above. There are other issues to consider:

  • There’s definitely a better solution with better performance than using BaseEncoding.
  • There’s no formal proof to the randomness of strings generated.

Personally, I often come across requirements for random strings in various formats, and I see a good reason for Guava to provide dedicated APIs for generating random strings. The fluid-style can perfectly fit into this situation. Please let me know your thoughts on this issue.

— Reply to this email directly or view it on GitHub https://github.com/google/guava/issues/2113.

thiagokronig avatar Jul 21 '15 21:07 thiagokronig

@thiagokronig checked that before, still not solving all the problems

lvxiang avatar Jul 22 '15 01:07 lvxiang

This answer uses no BigInteger and is pretty general. It also seems to be optimal (except for using StringBuilder where char[] would do).

Maaartinus avatar Jul 22 '15 01:07 Maaartinus

@Maaartinus I don't think it's general enough as you have to redefine "AB" constantly. Except for StringBuilder, it might not be optimal in some cases, i.e, if all you want is a string of decimal digits, the following code might be faster:

Random rng = new Random();
char[] str = new char[length];
for(int i = 0; i < length; i ++)
    str[i] = (char) ('0' + rng.nextInt(10));
return new String(str);

lvxiang avatar Jul 22 '15 02:07 lvxiang

I agree with @lvxiang in the sense that usually people have a set of constraints and want to generate strings according to those sets of constraints, without having to redefine them everywhere.

You can have something like a step-based mechanism in which you first define/compile constraints and then you can fastly generate strings based on those constraints.

Answers saying that you can "simply" store your string of acceptable characters discard the need for modularity, reusability and ease. The "simply" is only the core of the issue and a lot can be done around it to make it better for the developer. It's like reinventing the wheel all over again: we want to avoid that in development ;)

Therefore, all I can recommend today is to check the library Passay or, without trying to put any light on it, my modest take on the problem.

With these libs, you can declare which generator you want and then generate as much as you want very very easily.

ogregoire avatar Jul 22 '15 09:07 ogregoire

@ogregoire like your Rule and Ruler idea, almost same as what I have in my mind. You might consider taking the fluid-style by introducing something like a RuleBuilder.

lvxiang avatar Jul 24 '15 08:07 lvxiang

@lvxiang Well, I'm glad you like it. If you have suggestions, please file an issue there and let's continue this conversation there as well.

This issue here should stay focused on the integration of your idea into Guava.

ogregoire avatar Jul 24 '15 09:07 ogregoire

@ogregoire agreed

lvxiang avatar Jul 24 '15 09:07 lvxiang

+1

mrniko avatar Oct 05 '15 08:10 mrniko

+1. I want to get rid of Apache's commons-lang3 but I need org.apache.commons.lang3.RandomStringUtils

kpavlov avatar Jul 01 '16 10:07 kpavlov

+1

bobbui avatar Nov 14 '16 02:11 bobbui

FWIW, this all seems easy enough to do with streams. I threw together this little snippet to print 10 strings of 10 random ASCII (though you could choose what ever codepoints you want) characters:

Random random = new Random();
Stream<String> randomStrings =
    Stream.generate(
        () ->
            random
                .ints('a', 'z')
                .limit(10)
                .collect(
                    StringBuilder::new,
                    (builder, codePoint) -> builder.appendCodePoint(codePoint),
                    StringBuilder::append)
                .toString());
randomStrings.limit(10).forEach(System.out::println);

Filtering, transforming to upper case, etc. are all easy enough to implement as further stream operations. Given that it's straightforward enough to get a random string from the APIs in the JDK, I'm having a hard time imagining that this is such a common problem as to warrant its own, specific API in Guava -- a specific API would be more readable, but probably too niche.

gk5885 avatar Jan 10 '17 03:01 gk5885

I just searched "random string java" in Google and got 381k hit. Doesn't seem very niche to me.

ogregoire avatar Jan 10 '17 03:01 ogregoire

We generally judge need based on evidence within Google's (very large) code base, not Google searches. By that metric, there is a much stronger need for an API to generate random cats given that "random cat java" produces 940k results. :)

gk5885 avatar Jan 10 '17 03:01 gk5885

Okay, you got me there. I'll try to find a good API to generate random cats then ;-)

More seriously, there are implementations that are fed here, there is a request, there is a real need from several of your user. All that's left to the Guava team is to review and to accept one of the PRs. Leave it in beta for a few releases and see if it's used. If not, drop it.

Le mar. 10 janv. 2017 à 04:40, Gregory Kick [email protected] a écrit :

We generally judge need based on evidence within Google's (very large) code base, not Google searches. By that metric, there is a much stronger need for an API to generate random cats given that "random cat java" produces 940k results. :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/guava/issues/2113#issuecomment-271478489, or mute the thread https://github.com/notifications/unsubscribe-auth/AAotWmLvXnfODqgRVqNWd-QUYbd1l_etks5rQv2ogaJpZM4FczoY .

ogregoire avatar Jan 10 '17 03:01 ogregoire

Oh, by the way, @gk5885, have you seen any 'z' printed by your snippet?

There we go! The problem is more complex than one can expect and errors happen so easily. For working on the Guava team, you're probably a good programmer and it was rather easy for you to make that snippet. But check everywhere else, the problem is not a trivial one. At least not as trivial as you think. Working with randomness is hard because it's not easily testable, and bugs come quick (yup, even for a talented programmer like you as we just saw ;).

I'm kind of sad that the only metric for Guava is "we see it often in our codebase". Another good one would be "it seems simple, but it's a magnitude harder than that".

ogregoire avatar Jan 10 '17 04:01 ogregoire

It's not the only metric, but it's an important one. Utility is the other primary consideration, and obviously the difficulty of implementing a correct solution is an aspect of that.

jrtom avatar Jan 10 '17 05:01 jrtom

@ogregoire I'd say that what you wrote is a generator for passwords rather than for arbitrary strings. At least that how it'd be probably used. But then plain Random must not be the default as it's insecure.

I guess, it could get nearly as popular as random java cats, except for that there are many options you didn't cover. (*)

  • For example, I'm using a much simpler generator, but I need variable string length.
  • Some people might require maximum speed and this could be achieved by passing ThreadLocalRandom.current() to next (with your generator they'd need to keep an instance per thread).
  • I recall there was a site requiring a password containing two to four digits (and some other chars) or something like this. Crazy, I know.

(*) And nobody can cover, as there are just too many strange wishes.

Maaartinus avatar Jan 10 '17 06:01 Maaartinus

@Maaartinus Yes, I know! I have somewhere under my elbow a password generator that can easily take those cases into account, but that's not the point. I tried to check most of the use cases as seen on Stack Overflow. And while the options you mention aren't unknown, they are much more rare (though the next(Random) always was my personal favorite, it's a burden on the user).

But then, this is a free software (as in speech) and discussions and improvements are welcome, I guess! I never said I hold the one truth and if I overlooked at some things, like the SecureRandom as default, I humbly ask to have them pointed so I can suggest a better alternative.

@jrtom It's basically the only metric we hear of in the past few years. Yes, there are others, but that aspect is rather opaque, from a point of view external to Google.

ogregoire avatar Jan 10 '17 10:01 ogregoire

Yes, this is old, but here's some better code for this:

    static final Function<Integer, String> randomNumeric = (lim) ->
            new Random()
                    .ints(lim, '0', '9' + 1)
                    .collect(
                            StringBuilder::new,
                            StringBuilder::appendCodePoint,
                            StringBuilder::append)
                    .toString();

    static final Function<Integer, String> randomAlphabetic = (lim) ->
            new Random()
                    .ints(lim, 'a', 'z' + 1)
                    .collect(
                            StringBuilder::new,
                            StringBuilder::appendCodePoint,
                            StringBuilder::append)
                    .toString();

Steve973 avatar Feb 04 '23 01:02 Steve973