pharo icon indicating copy to clipboard operation
pharo copied to clipboard

atRandom & integer agnosticism

Open Ducasse opened this issue 7 years ago • 7 comments

Hello community.

Kindly consider:

| n i r |
n := ( 2 ** 256 ) - 1 .  "the largest 256-bit value"
i := ( 0 to: n ) .  "the interval 0 to 2**256-1"
"Produce three sets of computed test results."
3 timesRepeat: [
   Transcript show: ( r := i atRandom ) ; cr .
   Transcript show: ( r asByteArray hex ) ; cr . ]

According to comments in Pharo library code, atRandom fails for integer magnitudes greater than 2**30 . A reason for this limitation is not given, and it's not clear to a novice such as myself why the computed result of atRandom is permitted to sometimes be in conflict with the semantics of atRandom .

Can anyone provide a basic explanation of the rationale for a 'broken' atRandom ?

Alternatively, is this a known bug?

Thanks. dr

Ducasse avatar Feb 11 '19 11:02 Ducasse

Here is the answer of NICE

Try it in Squeak and pick the relevant methods. Open a bug report and commit the fix. Of course, original authorship will somehow be spoiled, so be kind and cite them in commit message.

Ducasse avatar Feb 11 '19 11:02 Ducasse

Why go via Interval ?

Why not just

(2 ** 256) atRandom.

?

IM(H)O Interval as it currently stands is a mess, it is an endless source of bug reports for years now. Because it is not clear what it is and is not. People use them with all kinds of numbers (incl inexact floats) and expect all collection methods to work.

On 11 Feb 2019, at 12:13, StéphaneDucasse [email protected] wrote:

Hello community.

Kindly consider:

| n i r | n := ( 2 ** 256 ) - 1 . "the largest 256-bit value" i := ( 0 to: n ) . "the interval 0 to 2**256-1" "Produce three sets of computed test results." 3 timesRepeat: [ Transcript show: ( r := i atRandom ) ; cr . Transcript show: ( r asByteArray hex ) ; cr . ]

According to comments in Pharo library code, atRandom fails for integer magnitudes greater than 2**30 . A reason for this limitation is not given, and it's not clear to a novice such as myself why the computed result of atRandom is permitted to sometimes be in conflict with the semantics of atRandom .

Can anyone provide a basic explanation of the rationale for a 'broken' atRandom ?

Alternatively, is this a known bug?

Thanks. dr

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

-- Sven Van Caekenberghe - mailto:[email protected] Beta Nine - software engineering - http://www.beta9.be

svenvc avatar Feb 11 '19 11:02 svenvc

The "random" index for atRandom is, basically, computed prngOutput mod: selectionRange.

For large selection ranges, it fails because the period of the PRNG used by atRandom is 2^30th, with the output an integer in the range 1 - 2^30th. (in other words, the prng is a repeating sequence of every number between 1 and 2^30th)

That means:

  • There will be bias when the range you pick from approaches 2^30th.
  • There will be gaps (indexes that won't ever be picked) once you pass 2^30th.

The reason is techical; the default RNG is a reasonable compromise between speed, memory, and usable range. If the intended use requires a larger range, an appropriate random number generator can be provided using atRandom: It certainly could give an error and/or select another RNG automatically if outside applicable range, but no one has been bothered enough to do either, yet.

Cheers, Henry

Ducasse avatar Feb 11 '19 18:02 Ducasse

Thanks for this link. Note that there are quite good classical PRNG already programmed for Pharo in package Polymath including a Mersenne twister and Marsaglia KISS as used in gfortran... (I did not check recently but there should be a catalog entry for Polymath). The one programmed in Squeak is a Mersenne Twister I think, carefully written for maximizing performance for 32bits VM (with 31bit SmallIntegers).

IMO, the Park-Miller PRNG was a good default choice in the 70s when it was introduced in Smalltalk. But now, we have fast enough machines to not bother with its limitations.

Le lun. 11 févr. 2019 à 15:58, David Richards < david.i.richards.iii at gmail.com> a écrit :

Thanks Nicolas and Henry,

For now, simple alternative expressions can easily generate the values I need for my project, such as:

(String new: 32) collect: [ :each | '0123456789abcdef' atRandom ]

Maybe in a few months (or years!) from now I will have acquired enough ambient familiarity with Pharo to attempt a fix. But, for the time being, I'll just avoid stepping on the landmine.

In doing some research on this problem I spotted this, which seemed to possibly point a way toward a cutting edge RNG implementation for Pharo:

http://xoshiro.di.unimi.it/#intro

[image: image.png]

Ducasse avatar Feb 12 '19 09:02 Ducasse

If all you need is a 64-bit PRNG, then SplitMix64 is a good candidate both for speed (uses bit shifting, xoring and multiplication only) and it's very robust (statistically speaking).

http://xorshift.di.unimi.it/splitmix64.c

bstjean avatar Mar 31 '19 10:03 bstjean

In Pharo 12 I got

33665478090738534551265947698821085861731431479409397989099502699963737964544
4a6dfc4000000000000000000000000000000000000000000000000000000000
56737219946914414999965920440123632085542434921225913774821309228306904645632
7d7020e400000000000000000000000000000000000000000000000000000000
9899038960990396535394384800841929201551291814850384676665448694206894702592
15e2a85200000000000000000000000000000000000000000000000000000000

Ducasse avatar Jul 30 '24 07:07 Ducasse

In P9

61525305964451392265031629257731411204474240315985566218943821361314187444224
880616d1100c3000000000000000000000000000000000000000000000000000
32460455300860748330048361511611026813408964515496646286299724575483742912512
47c3f7748f87f000000000000000000000000000000000000000000000000000
66339844569971116430435079087868031799733900749859557373570528374113358053376
92ab057b25560800000000000000000000000000000000000000000000000000

Ducasse avatar Jul 30 '24 07:07 Ducasse