scala-collection-contrib
scala-collection-contrib copied to clipboard
Rename MultiSet to Bag, MultiDict => MultiMap?
Bag is a lot less verbose than MultiSet, and MultiDict is also pretty awkward to try and say aloud.
The name MultiDict was chosen to not conflict with the existing MultiMap.
Is Bag an established name for sets containing possibly multiple occurrences of their elements?
Bag does appear to be an established name:
https://en.wikipedia.org/wiki/Set_(abstract_data_type)#Multiset https://en.wikipedia.org/wiki/Multiset https://commons.apache.org/proper/commons-collections/userguide.html#Bags
but you're likely right, that MultiSet is more commonly used, I'm not sure.
That's too bad about the MultiMap conflict.
This does raise the question of why we need a mutable.MultiMap in stdlib as well as a mutable.MultiDict in collections-contrib. Is there an end game of deprecating the stdlib version?
Iām in favor of deprecating mutable.MultiMap but anyways we need to make both solutions cohabit together during the transition period.
Do we need both a counted set and a set which retains original copies? Although both can occasionally be useful, I would tend to say that most uses of the latter would just as well be served by a MultiMap. It's only the former that is really awkward and inefficient.
However, I don't think the counted set is very intuitively a Bag. That's not how physical bags work: you might not care that apples are different, but you get the same one out that you put in. So :-1: for MultiSet to Bag naming, assuming that MultiSet is a counted set. If it is not a counted set, :-1: for having it at all.
Also :-1: to introducing a name collision between the old MultiMap trait and a new MultiMap that may have both immutable and mutable variants. MultiDict is awkward, though. In principle, :+1: for finding something better. ManyMap or KeyBag perhaps. If KeyBag, then I guess MultiSet as a plain Bag is okay. Or maybe have all three: CountedBag, Bag, and KeyBag. But they will all be distinct APIs, I suspect, so maybe Bag is again bad.
The MultiSet here is a counted multiset: https://github.com/scala/scala-collection-contrib/blob/master/src/main/scala/scala/collection/MultiSet.scala#L50-L53
I prefer MultiDict over KeyBag or ManyMap. https://www.google.com/search?q=multidict
Thanks @Ichoran ! Some points:
However, I don't think the counted set is very intuitively a Bag. That's not how physical bags work: you might not care that apples are different, but you get the same one out that you put in. So š for MultiSet to Bag naming
So if I interpret correctly, you're saying that Bag is unintuitive, because when you insert an element into it, you expect that exact object (up to reference identity) should be found in the resulting Bag. This critique is fair enough in isolation but in Scala-land the ethos is generally that we don't care about reference equality and only care about value equality. For instance, you might have made the same critique of Sets -- that you expect that when you put an apple in a Set and get it back out, you get the same one you put in.
Now for some quick points for the name change:
-
Baghas 3 letters vsMultiSet's 8, so it's much more convenient to type, which makes a pretty big difference when we treat collectionapplyas "literal" syntax (Bag(...)). The length savings is the same difference asSeqvsSequencefor instance. This may seem like a silly point, but anecdotally, I've even heard of people usingSeqoverListjust to save one char. -
While
MultiSetis one accepted term and possible the most popular,Bagis not too much less popular, and avoids potentially confusing people into thinking that aMultiSetis some kind of special case ofSet(a la HashSet/TreeSet/etc), when in fact MultiSets/Bags have nothing to do with Sets in the Iterable hierarchy, and do not behave like Sets. -
I think that the
Baganalogy very nicely visualizes what it is we're describing. A bag is an unordered collection of data, and a bag is equal to an other bag if and only if the two contain the same elements (including arity), ignoring order. The termBagreally drives home that order is ignored, IMO, because bags are all jumbled up/unstructured inside.
You can't get elements out of sets, and if you manage to, you do get one that you put in (if you put in duplicates, you obviously only get one of them out).
The bag is acting more like a bank: you deposit an item, and you get out an equivalent item, but not necessarily the one you put in. (Though it's weirder if counted--it will give you back the same item multiple times, rather than drawing separate instances from a vault of available instances.)
The point about the length of the name is a good one, though. That may be important enough to override all the other concerns.
Oh, also, next time my groceries end up packed all jumbled up/unstructured, I will remind myself that it is because they were put in a bag :joy:
Iām convinced that Bag is a good name for counted sets. But the problem is more on the side of MultiDict. Iām not sure that KeyBag makes sense here. A MultiDict (or MultiMap) associates a set of values to a key.
@julienrf I also don't think KeyBag is an improvement over MultiDict. MultiMap is probably best, save for the name collision. If I had to think of a different name, and I do prefer to go with practical/"blue collar" names over academic ones, I might consider something like OneToMany but I'm not sure that's necessarily any better than MultiDict (though it has exact same length). š¤·āā
Copied from #98
TLDR; I think it should be named SetMultiMap
I think it may make the most sense to rename MultiDict to SetMultiMap, and potentially make MultiMap into a generic trait.
- It helps remove the @deprecation warning and name conflict with the 2.12/2.13
MultiMap - "MultiMap" is a more universally used terminology for this type of collection. (e.g. Guava has
ArrayListMultimap,ListMultimap,SetMultimap, and etc; C++ STL isstd::multimap; Rust isMultiMap, Apache Commons isMultiMap, etc) - The existing name also does support or convey the collection type of the key's values. (e.g. it currently implements
Set, but there is no way to extend it to use anArray,VectororListinstead) - The code still uses
MultiMapin various locations (like in the hashcode!), and theCC[_]isMapFactory(e.g. notDictFactory) - There are no other Dictionary terms in any of the Scala collections. It's
Mapfor any scala code, only the java converters have the mention of a dictionary.MultiDictisn't scala idiomatic.
@er1c I agree that MultiMap and MultiSet (I would prefer Bag) should be abstract data types (akin to Seq, Map, Set). I also tried to point this out in https://github.com/scala/scala-collection-contrib/issues/25. I think that everyone agrees that 'MultiDict' is much worse a name than MultiMap, but it was named that way to avoid collisions with the stdlib implementation.
I agree that there would be valid usecases for MultiMaps where the underlying value collection is a Set or a Seq or even a Bag. Perhaps the best way to achieve this would be to utilize the existing Iterable Factories so that users can pass in whatever kind of collection they like? Maybe this is possible:
trait MultiMap[K, V, C[_]] { ... }
object MultiMap {
def withValuesIn[C[_]](factory: IterableFactory[C]): Factory[(K, V), MultiMap[K, V, C] = ???
}
// user code
MultiMap.withValuesIn(List)(1 -> 1, 1 -> 2, 1 -> 3)
Perhaps some reasonable default value collection should be chosen (probably Set?) so that users can still initialize a MultiMap without specifying the value:
object MultiMap extends Factory[(K, V), MultiMap[K, V, Set]] {
/// implement factory...
// additional method
def withValuesIn[C[_]](factory: IterableFactory[C]): Factory[(K, V), MultiMap[K, V, C] = ???
}
// user code:
MultiMap(1 -> 1, 1 -> 2) // MultiMap[Int, Int, Set]
MultiMap.withValuesIn(List)(1 -> 1, 1 -> 2) // MultiMap[Int, Int, List]
MultiMap.withValuesIn(Bag)(1 -> 1, 1 -> 2) // MultiMap[Int, Int, Bag]
If that route is taken then I would suggest making a static factory for each of the most common collection types so that we avoid allocating on each MultiMap creation, so:
object MultiMap extends Factory[(K, V), MultiMap[K, V, Set]] {
/// implement factory...
// additional method
private final val ListMultiMapFactory = ???
private final vall SetMultiMapFactory = ???
private final val VectorMultiMapFactory = ???
def withValuesIn[C[_]](factory: IterableFactory[C]): Factory[(K, V), MultiMap[K, V, C] = factory match {
case List => ListMultiMapFactory
case Vector => VectorMultiMapFactory
case Set => SetMultiMapFactory
case other => new MultiMapFactory(other)
}
}
Care should also ideally be taken to design the api so that users that just want to use the default collection don't have to specify the higher kinded type parameter everywhere. This is the same principle that was applied to the new views which make them much more ergonomic (so, View[Int] rather than View[Int, List[Int]]). Maybe MultiMap[K, V] extends a more general trait like MultiMapWithValues[K, V, C[_]].
Or maybe I'm just over complicating things, just giving ideas.
trait MultiMap[K, V, C[_]] { ... } š
This is exactly what I was thinking. I think the only other option is MultiMap[K, V, C <: Iterable[V]]
My brain starts going into CanBuildFrom implicits when thinking about it, so I probably need to brush up more on the 2.13 collections to see a better builder strategy :)
abandoned draft PR: #24
Bagwell.