scala-collection-contrib
scala-collection-contrib copied to clipboard
Possible bug in implementation of `MapDecorator.mergeByKeyWith`.
Seen in
0.3.0
Reproduction
Scala worksheet, so assertions are verified by eye.
Using mergeByKey
for convenience, knowing it delegates to mergeByKeyWith
.
import scala.collection.decorators.mapDecorator
val arthur = "arthur.txt"
val tyson = "tyson.txt"
val sandra = "sandra.txt"
val allKeys = Set(arthur, tyson, sandra)
val sharedValue = 1
val ourChanges = Map(
(
arthur,
sharedValue
),
(
tyson,
2
)
)
val theirChanges = Map(
(
arthur,
sharedValue
),
(
sandra,
3
)
)
ourChanges -> theirChanges
ourChanges.mergeByKey(theirChanges)
// Expect all the keys to appear in an outer join, and they do, good...
ourChanges.mergeByKey(theirChanges).keys == allKeys
theirChanges.mergeByKey(ourChanges)
// Expect all the keys to appear in an outer join, and they do, good...
theirChanges.mergeByKey(ourChanges).keys == allKeys
// Expect the same associated values to appear in the join taken either way around, albeit swapped around and not necessarily in the same key order. They are, good...
ourChanges
.mergeByKey(theirChanges)
.values
.map(_.swap)
.toList
.sorted
.sameElements(theirChanges.mergeByKey(ourChanges).values.toList.sorted)
// Expect these to be equal, and they are, good...
ourChanges.mergeByKey(theirChanges).keySet == theirChanges
.mergeByKey(ourChanges)
.keys
val theirChangesRedux = Map(
(
arthur,
sharedValue
),
(
sandra,
sharedValue // <<<<------- Ahem!
)
)
ourChanges -> theirChangesRedux
ourChanges.mergeByKey(theirChangesRedux)
// Expect all the keys to appear in an outer join, but they don't...
ourChanges.mergeByKey(theirChangesRedux).keys == allKeys
theirChangesRedux.mergeByKey(ourChanges)
// Expect all the keys to appear in an outer join, and they do, good...
theirChangesRedux.mergeByKey(ourChanges).keys == allKeys
// Expect the same associated values to appear in the join taken either way around, albeit swapped around and not necessarily in the same key order. They aren't...
ourChanges
.mergeByKey(theirChangesRedux)
.values
.map(_.swap)
.toList
.sorted
.sameElements(theirChangesRedux.mergeByKey(ourChanges).values.toList.sorted)
// Expect these to be equal, but they aren't...
ourChanges.mergeByKey(theirChangesRedux).keySet == theirChangesRedux
.mergeByKey(ourChanges)
.keys
Expectation
The Scaladoc for mergeByKey
states:
Perform a full outer join of this and that.
Equivalent to mergeByKeyWith(that) { case any => any }.
So all the keys from ourChanges
and theirChanges
should appear in the resulting map, albeit not necessarily in the same order if the specialised map implementations are created (which they are in the example above). The corresponding values from one or both of ourChanges
and theirChanges
should appear in the associated pairs, wrapped in Some
.
Observed
ourChanges.mergeByKey(theirChangesRedux)
drops key sandra
, whereas theirChangesRedux.mergeByKey(ourChanges)
preserves all keys.
Discussion
There is a set, traversed
that is used to prevent duplicate key processing when switching from iterating over from coll
to other
. It is however typed as a set over the value type W
of other
, and is populated with values, not keys.
In the example above, the use of a shared value triggers the problem, causing the entry for key sandra
to be dropped.
I am assuming this was a typo and not the desired behaviour.
If my diagnosis is correct, I can submit a PR with a proper test and the change in implementation to make it pass.
I'll wait until others chime in in case I've misinterpreted the original intent...