ramda icon indicating copy to clipboard operation
ramda copied to clipboard

R.unionWith documentation doesn't say which one of equal elements will be included

Open emlai opened this issue 5 years ago • 8 comments

const l1 = [{a: 1, b: 'b'}];
const l2 = [{a: 1, c: 'c'}];
R.unionWith(R.eqBy(R.prop('a')), l1, l2);

It's not clear from the documentation whether this should return [{a: 1, b: 'b'}] or [{a: 1, c: 'c'}].

emlai avatar Oct 04 '19 11:10 emlai

Since Ramda focuses on value equality, we don't make the call. I'm not sure if we still do, but at one point, at least one of our Set-like array operations varied in that behavior based upon which of the two supplied lists was shorter (a performance optimization that was perhaps not the best idea.)

I'm not sure Ramda would want to commit to choosing a or b here. My feeling is that if you're saying these two are equal, then you shouldn't care which one is included, or even if it's a different object altogether that also tests equal (not that we would do the latter, of course, but it shouldn't matter.)

CrossEye avatar Oct 04 '19 23:10 CrossEye

Ok, that applies to R.union, but for R.unionWith the results are different values. I modified to issue to be just about R.unionWith.

My use case is I want to combine two arrays but prefer values from one of the arrays if there are duplicates as determined by the predicate function. I just want to know if I can rely on the order of the array arguments, or if the behavior is explicitly unspecified and I should look for another solution. I think this information should be in the documentation.

emlai avatar Oct 08 '19 14:10 emlai

It seems to me that the same thing applies to unionWith. But if you want to make a PR to clarify the documentation, pointing to the current behavior (taking the first-found value), it will definitely get a fair hearing.

If you do, I would love to see the example changed to use a lambda for the predicate rather than R.eqBy(R.prop('a'), something like (x, y) => x.a == y.a. I think the function examples should be as stand-alone as we can reasonably make them.

CrossEye avatar Oct 08 '19 15:10 CrossEye

For uniqWith we already have the following in its documentation: "Prefers the first item if two items compare equal based on the predicate". So a similar sentence could be added for unionWith.

I might do a PR at some point if I get time, but if somebody wants to pick this up before that, feel free.

emlai avatar Oct 09 '19 09:10 emlai

Two comments to unionWith function:

  • The order of the params for the predicate function looks wrong to me. First parameter corresponds with the second object. and second parameter corresponds with the first object. This makes the first example to fail and the second to work.
R.unionWith((l, r) => l.date.substr(0,10) === r.date, [{date:'2020-01-02T00:00:00'}],  [{date:'2020-01-02'}]) 
//?  [{date:'2020-01-02T00:00:00'}, {date:'2020-01-02'}]

And this works:

R.unionWith((l, r) => l.date === r.date.substr(0,10), [{date:'2020-01-02T00:00:00'}],  [{date:'2020-01-02'}]) 
//? [{date:'2020-01-02T00:00:00'}]
  • I am not sure about this one... but shouldn't it better to take the second item when there is a match? Last parameter usually has more preference considering (data last parameter rule). If we were talking about a function called update, modify or something similar which reinforces the idea of first params modifying the master... I would understand the case... but union doesn't give strength to one side... so being even by the operation, then last parameter is the master by default.... it is like the main stream of data (last parameter) only accepting new data and rejecting duplicated data.

josuamanuel avatar Apr 02 '21 22:04 josuamanuel

@ josuamanuel:

The order of the params for the predicate function looks wrong to me.

But the point of this function is that the user is saying that the two values are equal. Equality is transitive. If there are any a and b values such that

fn (a, b) ≠ fn (b, a)

then fn is not a suitable predicate to pass to functions such as unionWith. Any function that treats a and b differently is suspect.

[S]houldn't it better to take the second item when there is a match?

While there's an interesting argument there about API consistency, the user should never care. If we're doing a union of two distinct types that are close enough, the user shouldn't care which of the values is returned.

Similarly to the above, things should be symmetric. If fn is our equality function then the user should not ever need to care about the difference between unionWith (fn) (xs, ys) and unionWith (fn) (ys, xs). If a difference between them could cause problems further in the codebase, then unionWith is not suitable for that usage.

There was at some point an interesting performance enhancement which removed some consistency in how the parameters were ordered. We could satisfy such requests by undoing that, but it seems to me that there is no gain, only loss. If the order of the parameters to the predicate or the order of the last two parameters to union* matters to a codebase, I would suggest that the user find a different tool than union*.

CrossEye avatar Apr 03 '21 01:04 CrossEye

@CrossEye It looks really restricted in both sides fn (a, b) must equal fn (b, a) and unionWith(fn) (xs, ys) must equal (fn) (ys, xs)

Documentation is a little bit short in these aspects.

I understand now... I ended up using Map that probably is a better tool for my purpose and with much much better performance.

Thanks for the great explanation!!!

josuamanuel avatar Apr 04 '21 01:04 josuamanuel

@josuamanuel:

Documentation is a little bit short in these aspects.

We'd love to hear suggestions, either here or as a pull request!

I understand now... I ended up using Map that probably is a better tool for my purpose and with much much better performance.

I would always choose Maps/Sets for primitive types. Ramda allows for value equality as well, and it useful for those cases where Map/Set won't do. (For instance, new Set ([{x: 1}, {x: 1}]) has size of 2, where the Ramda tools would see only one.) But it is much less performant for the primitive cases, as it has to call Ramda's equals function for many different pairs.

CrossEye avatar Apr 05 '21 18:04 CrossEye