scala-library-next icon indicating copy to clipboard operation
scala-library-next copied to clipboard

Add randomElement

Open Lasering opened this issue 3 years ago • 9 comments

Hey, I propose the method randomElement to be added to Iterable. This method returns a random element from the collection.

On Iterable it can be (probably naively) implemented with:

def randomElement: A = {
  val n = util.Random.nextInt(size)
  iterator.drop(n).next
}

On Seq:

def randomElement: A = {
  val n = util.Random.nextInt(size)
  apply(n)
}

A version receiving the Random/seed could also be added:

def randomElement(random: Random): A = ???
def randomElement(seed: Long): A = ???

Or maybe the Random/seed could be an Option (this way there would no overloads):

def randomElement(random: Option[Random] = None): A = ???

I'm certainly not the right person to know which approach is better.

Lasering avatar Jul 05 '22 17:07 Lasering

This seems like the kind of thing that should go into a random number library, not a collections library. Seems awfully specialized. What if you want to choose k of n things? What if you want k things with replacement?

This seems to me just one of a variety of random sampling tasks that are useful. (Note that "shuffle" is also in this class, and that is provided in the random number library.)

Ichoran avatar Jul 05 '22 17:07 Ichoran

Given my comments about a version receiving the random/seed I understand your comments. I shouldn't have suggested that. However the simple version is very handy to have, at least these languages think so:

  • Ruby - the method is called sample.
  • Swift - is called randomElement and is available for collections
  • Rust - its called choose

Lasering avatar Jul 05 '22 21:07 Lasering

Adding the method to Random would already be an improvement, although the ergonomics wouldn't be as good as adding it to Iterable.

Lasering avatar Jul 06 '22 09:07 Lasering

Adding Python to languages that have this feature in their standard library.

julian-a-avar-c avatar Mar 29 '24 20:03 julian-a-avar-c

@Ichoran while I agree this should belong to Random rather than the collections framework, especially considering the need for a seed. While shuffle is general enough to represent this behavior, it feels unnecessarily expensive if you either just want a single element or want to keep getting multiple random elements when repetition is allowed. Having said that, I guess the signature must either be Option[A] or throw if the collection is empty.

BalmungSan avatar Mar 30 '24 14:03 BalmungSan

@BalmungSan - I didn't mention shuffle because I thought you should implement selection with shuffle! I just mentioned it because it indicates that Random is a natural home for methods like these.

It's fine to add Random.sample. I was just suggesting it not be quite so specific. For instance, r sample xs might pick one element of xs, but r.sample(xs, 5) might pick five, and r.sample(xs, 5, replace = true) might pick five that could repeat.

Ichoran avatar Mar 31 '24 03:03 Ichoran

@Ichoran ah fair and apologies for the misunderstanding. Yeah, that API feels great, the only missing thing would be the semantics of what happens if the n (the number of requested elements) is bigger than m (the size of the collection). Also, if the 1 case should be an overload with a different return type or just the default for m and get a collection of size 1.

BTW, my point was also that AFAIK this repo was not limited to only the collections framework, but the stdlib in general. And with plans for unfreezing the stdlib for future versions of Scala 3 I guess is great to re-activate the discussions here.

BalmungSan avatar Mar 31 '24 13:03 BalmungSan

Yeah, rather than using a default argument, I'd recommend sample(coll): A but sample(coll, n): Coll[A] would make sense. In the spirit of safe operations by default, if you ask for m > n items, I'd say you would write m = h + k * n where k >= 1 and h < n, and return k shuffles of the original collection plus h randomly selected items. That is, you use everything at random; if you run out, everything is available again and you keep going.

I'll add something like this to my kse3 library, and hopefully will report back if I run into any problems.

Ichoran avatar Apr 01 '24 04:04 Ichoran

@BalmungSan - I implemented something for arrays which generalizes fine to IndexedSeq. Then for things that aren't arrays, I implemented Algorithm L which doesn't even need to know how big the target is, but has a somewhat high constant cost (due to exponents/logs being used every step).

It is live in kse3 maths 0.2.13, and I decorated the collections themselves with sample (in addition to putting it in the random number generator I wrote). No major hitches save for the usual ones when you want extension methods on collections (extension methods don't always understand the inheritance hierarchy).

Here's a runnable example (using the variants that take a given random number source): https://scastie.scala-lang.org/dS0La51UQxm9Gu3nRzWjAA

Anyway, no significant problems!

I ended up using rng.sample(n)(coll) for argument order for the method on the random number generator, but coll.sample(n)(rng) for the order on collection extension method.

Ichoran avatar Apr 15 '24 21:04 Ichoran