easy-random icon indicating copy to clipboard operation
easy-random copied to clipboard

Randomization is not random

Open gliwka opened this issue 4 years ago • 4 comments

Some objects generated by this library are not random. Generated properties and/or properties of child-objects can be inter-dependent making this library unfit for certain kinds of tests. This comes as a suprise and totally breaks my expectations as a user.

Consider the following Laplace experiment:

We toss two coins, a quarter and a dime, for 50,000 times each. We would expect both kinds of coin to show heads and tails a similar number of times (~25,000 times per coin per side)

Let's implement this experiment using a unit test and easy-random:

public class RandomizationTest {
    enum SideFaceingUp {
        HEADS,
        TAILS
    }

    enum CoinType {
        DIME,
        QUARTER
    }

    @Value
    class CoinToss {
        SideFaceingUp sideFaceingUp;
        CoinType coinType;
    }

    @Test
    public void laPlaceExperiment() {
        EasyRandom random = new EasyRandom();
        Map<String, Long> distributions = random.objects(CoinToss.class, 100000)
                .collect(groupingBy(toss -> toss.coinType + " " + toss.sideFaceingUp, counting()));

        System.out.println(distributions);

        assertThat(distributions).containsKeys("DIME HEADS", "DIME TAILS", "QUARTER HEADS", "QUARTER TAILS");
        assertThat(distributions.values()).allSatisfy(distribution -> {
            assertThat(distribution).isCloseTo(25_000, withPercentage(1));
        });
    }
}

This test fails. The distribution looks like this: <{"DIME HEADS"=50022L, "QUARTER TAILS"=49978L}>

It seems like dimes only fall on heads and quarters only on tails. The value of the coin type is linked to which side is facing up.

This is not what I would expect from a random object generator.

gliwka avatar Jan 14 '21 21:01 gliwka

Thank you for opening this issue. Let me start by completely excluding Easy Random from the picture to explain what's happening. Consider the following test using two java.util.Random instances initialized with the same seed to generate random values from each enum which are then used to create CoinToss objects:

@Test
void laPlaceExperimentWithoutEasyRandom() {
    Random coinRandom = new Random(123);
    Random sideRandom = new Random(123);
    int[] randomCoins = coinRandom.ints(100000, 0, 2).toArray();
    int[] randomSides = sideRandom.ints(100000, 0, 2).toArray();
    CoinToss[] coinTosses = new CoinToss[100000];
    for (int i = 0; i < 100000; i++) {
        coinTosses[i] = new CoinToss(
                randomCoins[i] == 0 ? CoinType.DIME : CoinType.QUARTER,
                randomSides[i] == 0 ? SideFaceingUp.HEADS : SideFaceingUp.TAILS);
    }

    Map<String, Long> distributions = Arrays.stream(coinTosses).
            collect(groupingBy(toss -> toss.coinType + " " + toss.sideFaceingUp, counting()));

    System.out.println(distributions);

    assertThat(distributions).containsKeys("DIME HEADS", "DIME TAILS", "QUARTER HEADS", "QUARTER TAILS");
    assertThat(distributions.values()).allSatisfy(distribution -> {
        assertThat(distribution).isCloseTo(25_000, withPercentage(1));
    });
}

This test fails in the same way as Easy Random. The point of this test is to mimic the behaviour of Easy Random. Since two java.util.Random instances initialized with the same seed generate the same uniformly distributed sequence of numbers, we end-up with the same sequence of combinations of {CoinType, SideFaceingUp}.

Now when Easy Random generates a random enum, it stupidly takes a random number from the range [0, NbElements - 1] and returns the enum having that value as index (See EnumRandomizer). Now here's the most important detail: Easy Random uses a different EnumRandomizer for each enum type, however it initializes them all with the same seed (taken from EasyRandomParameters, which defaults to 123) to guarantee a reproducible sequence of objects.

Now if you use two different seeds for coinRandom and sideRandom in the previous test, the test passes. In Easy Random, this would be assigning two different EnumRandomizers (ie with different seeds) to each enum type. The following test passes:

@Test
public void laPlaceExperiment() {
    EasyRandomParameters parameters = new EasyRandomParameters()
            .randomize(CoinType.class, new EnumRandomizer<>(CoinType.class, 123))
            .randomize(SideFaceingUp.class, new EnumRandomizer<>(SideFaceingUp.class, 456));
    EasyRandom random = new EasyRandom(parameters);
    Map<String, Long> distributions = random.objects(CoinToss.class, 100000)
            .collect(groupingBy(toss -> toss.coinType + " " + toss.sideFaceingUp, counting()));

    System.out.println(distributions); // prints: {DIME HEADS=24877, QUARTER HEADS=25045, DIME TAILS=25145, QUARTER TAILS=24933}

    assertThat(distributions).containsKeys("DIME HEADS", "DIME TAILS", "QUARTER HEADS", "QUARTER TAILS");
    assertThat(distributions.values()).allSatisfy(distribution -> {
        assertThat(distribution).isCloseTo(25_000, withPercentage(2));
    });
}

Does this make sense? I hope this clarifies what's happening behind the scene and explains the observed behaviour.

Now I don't know if this could/should be documented and where to put in the wiki, but I'm open to suggestions.

fmbenhassine avatar Jan 16 '21 21:01 fmbenhassine

I believe that this the same behaviour I reported there: https://github.com/j-easy/easy-random/issues/404 @benas while your arguments are correct, I'd like to encourage you to reconsider the decision as it is just counterintuitive for (at least some) users. Pleasae note, that in your example the new Random(123); is executed twice and hence the repetitions are not surprising. In @gliwka example, there is only one new EasyRandom(), and presumably this is why the expectations are different.

mjureczko avatar Jan 22 '21 14:01 mjureczko

@mjureczko yes, my example explicitly uses two different instances of Random to explain what's happening behind the default EasyRandom random = new EasyRandom() in @gliwka 's example. Now if I have to explain the internals of ER here, this means there is something missing in the Javadocs or in the reference docs. Which brings me to the next point.

Some objects generated by this library are not random. Generated properties and/or properties of child-objects can be inter-dependent making this library unfit for certain kinds of tests

"Some" and "certain" are key here. Easy Random is not meant to cover all cases by default, as mentioned in the readme:

In most cases, default options are enough and you can use the default constructor of EasyRandom.

The initial example uses an EasyRandom instance with the default parameters. These sensible defaults work as expected for the majority of uses cases, but definitely not all of them (and this not the goal in the first place, nor the goal of any library or framework). If the defaults do not work for you, you can customize the behaviour as shown above.

Now here is the point: if the default behaviour is not explicit enough in the docs and might be surprising, then I'm open to improve the docs as needed.

fmbenhassine avatar Jan 22 '21 20:01 fmbenhassine

@benas Thank you for the explanation. I'll get back to you on this issue on the weekend.

gliwka avatar Jan 25 '21 13:01 gliwka

@gliwka

Thank you for the explanation. I'll get back to you on this issue on the weekend.

You did not mention which weekend, so you are covered 😄

Joke aside, I hope I clarified your concern. As mentioned previously, using a custom randomizer or assigning different enum randomizers is the way to go if the defaults are not suitable.

I am closing this issue for now, but feel free to add a comment if needed.

fmbenhassine avatar Sep 17 '23 18:09 fmbenhassine

@fmbenhassine Good one 😂 Not mentioning which weekend, got to remember that for the future.

As suggested, In 2021 I‘ve configured separate randomizers with their own seeds.

gliwka avatar Sep 18 '23 06:09 gliwka