scala-collection-contrib icon indicating copy to clipboard operation
scala-collection-contrib copied to clipboard

Possible bug in implementation of `MapDecorator.mergeByKeyWith`.

Open sageserpent-open opened this issue 10 months ago • 0 comments

Seen in

0.3.0

Reproduction

Scala worksheet, so assertions are verified by eye. Using mergeByKey for convenience, knowing it delegates to mergeByKeyWith.

import scala.collection.decorators.mapDecorator

val arthur = "arthur.txt"

val tyson = "tyson.txt"

val sandra = "sandra.txt"

val allKeys = Set(arthur, tyson, sandra)

val sharedValue = 1

val ourChanges = Map(
  (
    arthur,
    sharedValue
  ),
  (
    tyson,
    2
  )
)

val theirChanges = Map(
  (
    arthur,
    sharedValue
  ),
  (
    sandra,
    3
  )
)

ourChanges -> theirChanges

ourChanges.mergeByKey(theirChanges)

// Expect all the keys to appear in an outer join, and they do, good...
ourChanges.mergeByKey(theirChanges).keys == allKeys

theirChanges.mergeByKey(ourChanges)

// Expect all the keys to appear in an outer join, and they do, good...
theirChanges.mergeByKey(ourChanges).keys == allKeys

// Expect the same associated values to appear in the join taken either way around, albeit swapped around and not necessarily in the same key order. They are, good...
ourChanges
  .mergeByKey(theirChanges)
  .values
  .map(_.swap)
  .toList
  .sorted
  .sameElements(theirChanges.mergeByKey(ourChanges).values.toList.sorted)

// Expect these to be equal, and they are, good...
ourChanges.mergeByKey(theirChanges).keySet == theirChanges
  .mergeByKey(ourChanges)
  .keys

val theirChangesRedux = Map(
  (
    arthur,
    sharedValue
  ),
  (
    sandra,
    sharedValue // <<<<------- Ahem!
  )
)

ourChanges -> theirChangesRedux

ourChanges.mergeByKey(theirChangesRedux)

// Expect all the keys to appear in an outer join, but they don't...
ourChanges.mergeByKey(theirChangesRedux).keys == allKeys

theirChangesRedux.mergeByKey(ourChanges)

// Expect all the keys to appear in an outer join, and they do, good...
theirChangesRedux.mergeByKey(ourChanges).keys == allKeys

// Expect the same associated values to appear in the join taken either way around, albeit swapped around and not necessarily in the same key order. They aren't...
ourChanges
  .mergeByKey(theirChangesRedux)
  .values
  .map(_.swap)
  .toList
  .sorted
  .sameElements(theirChangesRedux.mergeByKey(ourChanges).values.toList.sorted)

// Expect these to be equal, but they aren't...
ourChanges.mergeByKey(theirChangesRedux).keySet == theirChangesRedux
  .mergeByKey(ourChanges)
  .keys

Expectation

The Scaladoc for mergeByKey states:

Perform a full outer join of this and that.
Equivalent to mergeByKeyWith(that) { case any => any }.

So all the keys from ourChanges and theirChanges should appear in the resulting map, albeit not necessarily in the same order if the specialised map implementations are created (which they are in the example above). The corresponding values from one or both of ourChanges and theirChanges should appear in the associated pairs, wrapped in Some.

Observed

ourChanges.mergeByKey(theirChangesRedux) drops key sandra, whereas theirChangesRedux.mergeByKey(ourChanges) preserves all keys.

Discussion

There is a set, traversed that is used to prevent duplicate key processing when switching from iterating over from coll to other. It is however typed as a set over the value type W of other, and is populated with values, not keys.

In the example above, the use of a shared value triggers the problem, causing the entry for key sandra to be dropped.

I am assuming this was a typo and not the desired behaviour.

If my diagnosis is correct, I can submit a PR with a proper test and the change in implementation to make it pass.

I'll wait until others chime in in case I've misinterpreted the original intent...

sageserpent-open avatar Apr 01 '24 10:04 sageserpent-open