scala-collection-contrib icon indicating copy to clipboard operation
scala-collection-contrib copied to clipboard

Rename MultiSet to Bag, MultiDict => MultiMap?

Open joshlemer opened this issue 6 years ago • 16 comments
trafficstars

Bag is a lot less verbose than MultiSet, and MultiDict is also pretty awkward to try and say aloud.

joshlemer avatar Apr 25 '19 13:04 joshlemer

The name MultiDict was chosen to not conflict with the existing MultiMap.

Is Bag an established name for sets containing possibly multiple occurrences of their elements?

julienrf avatar Apr 25 '19 14:04 julienrf

Bag does appear to be an established name:

https://en.wikipedia.org/wiki/Set_(abstract_data_type)#Multiset https://en.wikipedia.org/wiki/Multiset https://commons.apache.org/proper/commons-collections/userguide.html#Bags

but you're likely right, that MultiSet is more commonly used, I'm not sure.

That's too bad about the MultiMap conflict.

joshlemer avatar Apr 25 '19 14:04 joshlemer

This does raise the question of why we need a mutable.MultiMap in stdlib as well as a mutable.MultiDict in collections-contrib. Is there an end game of deprecating the stdlib version?

joshlemer avatar Apr 25 '19 18:04 joshlemer

I’m in favor of deprecating mutable.MultiMap but anyways we need to make both solutions cohabit together during the transition period.

julienrf avatar Apr 26 '19 08:04 julienrf

Do we need both a counted set and a set which retains original copies? Although both can occasionally be useful, I would tend to say that most uses of the latter would just as well be served by a MultiMap. It's only the former that is really awkward and inefficient.

However, I don't think the counted set is very intuitively a Bag. That's not how physical bags work: you might not care that apples are different, but you get the same one out that you put in. So :-1: for MultiSet to Bag naming, assuming that MultiSet is a counted set. If it is not a counted set, :-1: for having it at all.

Also :-1: to introducing a name collision between the old MultiMap trait and a new MultiMap that may have both immutable and mutable variants. MultiDict is awkward, though. In principle, :+1: for finding something better. ManyMap or KeyBag perhaps. If KeyBag, then I guess MultiSet as a plain Bag is okay. Or maybe have all three: CountedBag, Bag, and KeyBag. But they will all be distinct APIs, I suspect, so maybe Bag is again bad.

Ichoran avatar Apr 29 '19 20:04 Ichoran

The MultiSet here is a counted multiset: https://github.com/scala/scala-collection-contrib/blob/master/src/main/scala/scala/collection/MultiSet.scala#L50-L53

I prefer MultiDict over KeyBag or ManyMap. https://www.google.com/search?q=multidict

julienrf avatar Apr 30 '19 08:04 julienrf

Thanks @Ichoran ! Some points:

However, I don't think the counted set is very intuitively a Bag. That's not how physical bags work: you might not care that apples are different, but you get the same one out that you put in. So šŸ‘Ž for MultiSet to Bag naming

So if I interpret correctly, you're saying that Bag is unintuitive, because when you insert an element into it, you expect that exact object (up to reference identity) should be found in the resulting Bag. This critique is fair enough in isolation but in Scala-land the ethos is generally that we don't care about reference equality and only care about value equality. For instance, you might have made the same critique of Sets -- that you expect that when you put an apple in a Set and get it back out, you get the same one you put in.

Now for some quick points for the name change:

  • Bag has 3 letters vs MultiSet's 8, so it's much more convenient to type, which makes a pretty big difference when we treat collection apply as "literal" syntax (Bag(...)). The length savings is the same difference as Seq vs Sequence for instance. This may seem like a silly point, but anecdotally, I've even heard of people using Seq over List just to save one char.

  • While MultiSet is one accepted term and possible the most popular, Bag is not too much less popular, and avoids potentially confusing people into thinking that a MultiSet is some kind of special case of Set (a la HashSet/TreeSet/etc), when in fact MultiSets/Bags have nothing to do with Sets in the Iterable hierarchy, and do not behave like Sets.

  • I think that the Bag analogy very nicely visualizes what it is we're describing. A bag is an unordered collection of data, and a bag is equal to an other bag if and only if the two contain the same elements (including arity), ignoring order. The term Bag really drives home that order is ignored, IMO, because bags are all jumbled up/unstructured inside.

joshlemer avatar Apr 30 '19 16:04 joshlemer

You can't get elements out of sets, and if you manage to, you do get one that you put in (if you put in duplicates, you obviously only get one of them out).

The bag is acting more like a bank: you deposit an item, and you get out an equivalent item, but not necessarily the one you put in. (Though it's weirder if counted--it will give you back the same item multiple times, rather than drawing separate instances from a vault of available instances.)

The point about the length of the name is a good one, though. That may be important enough to override all the other concerns.

Ichoran avatar Apr 30 '19 23:04 Ichoran

Oh, also, next time my groceries end up packed all jumbled up/unstructured, I will remind myself that it is because they were put in a bag :joy:

Ichoran avatar Apr 30 '19 23:04 Ichoran

I’m convinced that Bag is a good name for counted sets. But the problem is more on the side of MultiDict. I’m not sure that KeyBag makes sense here. A MultiDict (or MultiMap) associates a set of values to a key.

julienrf avatar May 01 '19 13:05 julienrf

@julienrf I also don't think KeyBag is an improvement over MultiDict. MultiMap is probably best, save for the name collision. If I had to think of a different name, and I do prefer to go with practical/"blue collar" names over academic ones, I might consider something like OneToMany but I'm not sure that's necessarily any better than MultiDict (though it has exact same length). šŸ¤·ā€ā™‚

joshlemer avatar May 01 '19 13:05 joshlemer

Copied from #98

TLDR; I think it should be named SetMultiMap

I think it may make the most sense to rename MultiDict to SetMultiMap, and potentially make MultiMap into a generic trait.

  1. It helps remove the @deprecation warning and name conflict with the 2.12/2.13 MultiMap
  2. "MultiMap" is a more universally used terminology for this type of collection. (e.g. Guava has ArrayListMultimap, ListMultimap, SetMultimap, and etc; C++ STL is std::multimap; Rust is MultiMap, Apache Commons is MultiMap, etc)
  3. The existing name also does support or convey the collection type of the key's values. (e.g. it currently implements Set, but there is no way to extend it to use an Array, Vector or List instead)
  4. The code still uses MultiMap in various locations (like in the hashcode!), and the CC[_] is MapFactory (e.g. not DictFactory)
  5. There are no other Dictionary terms in any of the Scala collections. It's Map for any scala code, only the java converters have the mention of a dictionary. MultiDict isn't scala idiomatic.

er1c avatar Jun 30 '20 14:06 er1c

@er1c I agree that MultiMap and MultiSet (I would prefer Bag) should be abstract data types (akin to Seq, Map, Set). I also tried to point this out in https://github.com/scala/scala-collection-contrib/issues/25. I think that everyone agrees that 'MultiDict' is much worse a name than MultiMap, but it was named that way to avoid collisions with the stdlib implementation.

I agree that there would be valid usecases for MultiMaps where the underlying value collection is a Set or a Seq or even a Bag. Perhaps the best way to achieve this would be to utilize the existing Iterable Factories so that users can pass in whatever kind of collection they like? Maybe this is possible:

trait MultiMap[K, V, C[_]] { ... }

object MultiMap {
  def withValuesIn[C[_]](factory: IterableFactory[C]): Factory[(K, V), MultiMap[K, V, C] = ???
}

// user code
MultiMap.withValuesIn(List)(1 -> 1, 1 -> 2, 1 -> 3)

Perhaps some reasonable default value collection should be chosen (probably Set?) so that users can still initialize a MultiMap without specifying the value:

object MultiMap extends Factory[(K, V), MultiMap[K, V, Set]] {
  /// implement factory...

  // additional method
  def withValuesIn[C[_]](factory: IterableFactory[C]): Factory[(K, V), MultiMap[K, V, C] = ???
}


// user code:

MultiMap(1 -> 1, 1 -> 2) // MultiMap[Int, Int, Set]
MultiMap.withValuesIn(List)(1 -> 1, 1 -> 2) // MultiMap[Int, Int, List]
MultiMap.withValuesIn(Bag)(1 -> 1, 1 -> 2) // MultiMap[Int, Int, Bag]

If that route is taken then I would suggest making a static factory for each of the most common collection types so that we avoid allocating on each MultiMap creation, so:

object MultiMap extends Factory[(K, V), MultiMap[K, V, Set]] {
  /// implement factory...

  // additional method
  private final val ListMultiMapFactory = ???
  private final vall SetMultiMapFactory = ???
  private final val VectorMultiMapFactory = ???
  def withValuesIn[C[_]](factory: IterableFactory[C]): Factory[(K, V), MultiMap[K, V, C] = factory match {
    case List => ListMultiMapFactory
    case Vector => VectorMultiMapFactory
    case Set => SetMultiMapFactory
    case other => new MultiMapFactory(other)
  }
}

Care should also ideally be taken to design the api so that users that just want to use the default collection don't have to specify the higher kinded type parameter everywhere. This is the same principle that was applied to the new views which make them much more ergonomic (so, View[Int] rather than View[Int, List[Int]]). Maybe MultiMap[K, V] extends a more general trait like MultiMapWithValues[K, V, C[_]].

Or maybe I'm just over complicating things, just giving ideas.

joshlemer avatar Jun 30 '20 15:06 joshlemer

trait MultiMap[K, V, C[_]] { ... } šŸ‘

This is exactly what I was thinking. I think the only other option is MultiMap[K, V, C <: Iterable[V]]

My brain starts going into CanBuildFrom implicits when thinking about it, so I probably need to brush up more on the 2.13 collections to see a better builder strategy :)

er1c avatar Jun 30 '20 15:06 er1c

abandoned draft PR: #24

SethTisue avatar Oct 27 '20 14:10 SethTisue

Bagwell.

som-snytt avatar Nov 13 '21 02:11 som-snytt