scala-library-next icon indicating copy to clipboard operation
scala-library-next copied to clipboard

Add groupByTo and groupMapTo to IterableOnceOps

Open BalmungSan opened this issue 4 years ago • 6 comments

Similar to #8 the only reason why Iterators (or IterableOnce in general) do not have groupBy & groupMap is because, you can not have a Map[K, Iterator[V]] without consuming the Iterator as such the values of the Map would be empty.

But, since groupBy & groupMap already consume the Iterator I would say nobody would expect the values to be a lazy collection, I would even say that most people would be ok with either List or Seq. However, I would guess that the best would be just giving users the possibility to choose which collection to use to collect the results; this even opens the possibility to a somewhat common requirement of grouping unique values by using a Set.

So the proposal is to add the following two methods to IterableOnceOps so they are not only available on Iterator but for all collections.

def groupByTo[C, K](factory: Factory[A, C])(key: A => K): Map[K, C]
def groupMapTo[C, K, B](factory: Factory[B, C])(key: A => K)(f: A => B): Map[K, C]

BalmungSan avatar Dec 03 '20 13:12 BalmungSan

Another operation a bit similar would be:

trait IterableOps[A, CC[_], C] {
  def groupByTo[MC[_, _], K](factory: Factory[(K, CC[A]), MC[K, CC[A]]])(f: A => K): MC[K, CC[A]]
}

The motivation would be to be able to put the results in a TreeMap without computing an intermediate Map, for instance:

users.groupByTo(TreeMap)(_.age)

julienrf avatar Dec 03 '20 16:12 julienrf

@julienrf oh that is also a good idea! I think both could be done in the same operation and leaving the normal Map as a default, something like:

def groupByTo[CC[_], MC[_, _], K](factory: Factory[A, CC[A]], mapFactory: MapFactory[MC] = Map)(key: A => K): MC[K, CC[A]]

And I think it is absolutely worth it. I may have a couple of errors in the signature but as soon as #23 is merged I will give it a try to this one (even if it isn't formally accepted by then, unless it was formally rejected).

BalmungSan avatar Dec 03 '20 16:12 BalmungSan

I do think we should try to add as few ops that allow the most things, even if you have to specify a few more parameters in some cases (e.g. identity)

NthPortal avatar Dec 04 '20 08:12 NthPortal

Ok, I started to prototype this and I have a small API problem.

I defined the method like this:

def groupMapTo[K, B, C1, MC](key: A => K)(f: A => B)
                            (colFactory: Factory[B, C1])
                            (mapFactory: Factory[(K, C1), MC] = immutable.Map): MC = {

Which works great if you use it like this:

data.groupMapTo(_.country)(_.user)(colFactory = SortedSet)(mapFactory = SortedMap

However, this has two "problems"

  1. The most important one, If I just want a normal Map then I need to add an empty parenthesis at the end (), which looks quite ugly.
  2. I believe it feels more natural to put the factories first and then the key and mapping functions; especially because the mapping function can be quite big.

I originally tried to define the function like this:

def groupMapTo[K, B, C1, MC](colFactory: Factory[B, C1], mapFactory: Factory[(K, C1), MC] = immutable.Map)(key: A => K)(f: A => B): MC = {

However, the problem here is type inference. (Scala 3 allows to at least have both factories in the same last group, mitigating problem 1)

Does anyone have any ideas?

BalmungSan avatar Dec 07 '20 17:12 BalmungSan

What do you all think of an API like this:

trait GroupMap[K, V, CC[_], MC[_, _]] {
  def collectValuesAs[CC1[_]](factory: Factory[V, CC1[V]]): GroupMap[K, V, CC1, MC]
  def collectResultsAs[MC1[_, _]](factory: Factory[(K, V), MC1[K, V]]): GroupMap[K, V, CC, MC1]

  def result: MC[K, CC[V]]
  def reduce(reduce: (V, V)): MC[K, V] // Optional.
}

def groupMapTo[K, V](key: A => K)(f: A => V): NextIterableOnceOpsExtensions.GroupMap[K, V, CC, Map] = ???

So it can be used like this:

data
  .groupMapTo(_.country)(_.user)
  .collectValuesAs(SortedSet)
  .collectResultsAs(SortedMap)
  .result

This way in the future all the current methods like groupBy or groupMapReduce can be implemented in terms of this. This also opens the possibility of new methods like groupReduce and getting the results as a SortedMap.

BalmungSan avatar Dec 07 '20 18:12 BalmungSan

This looks promising!

julienrf avatar Dec 07 '20 19:12 julienrf