guava
guava copied to clipboard
Make `Iterators.mergeSorted()` stable
I've noticed that Iterators.mergeSorted()
is unstable.
So for example if I have arrays: [A_1, B_1, C_1]
[A_2, B_2, C_2]
(where the comparator looks only at the letter), I'd like the result to iterate [A_1, A_2, B_1, B_2, C_1, C_2]
. The current implementation doesn't guarantee this.
Might it be possible to add a stable version?
I confess it's not obvious what to do if the arrays are something like [A_1, A_2, B_1]
, [A_3]
- to me it would make sense to empty the first iterator of A_n
elements before moving on to the second, so [A_1, A_2, A_3, B_1]
.
[I appreciate I'm being a little lazy with my terminology here - I can provide a fully fleshed example if it's not clear]
Could you provide some examples please?
@tuannh982: given a really simple data class:
public static class Datum {
private final String letter;
private final int number;
public Datum(String letter, int number) {
this.letter = letter;
this.number = number;
}
public String letter() { return letter; }
public int number() { return number; }
@Override
public String toString() {
return letter + " " + number;
}
}
And the following test:
@Test
public void sortMe() {
List<Datum> left = List.of(
new Datum("B", 1),
new Datum("C", 1)
);
List<Datum> right = List.of(
new Datum("A", 2),
new Datum("C", 2)
);
var comparator = Comparator.comparing(Datum::letter);
var it = Iterators.mergeSorted(
List.of(left.iterator(), right.iterator()), comparator);
it.forEachRemaining(next->System.out.println(next));
}
I'd get:
A 2
B 1
C 2
C 1
Note how C 2
comes before C 1
. If the sort were stable I'd expect to see C 1
before C 2
because that's the order they appear in the input.
Full code here: https://gist.github.com/msmerc/279c32fe6620b0dcf2dc6f2c4dfa1469
@msmerc: okay, please assign this issue to me.
@chaoren @msmerc Hello. I would like to work on this issue. But This is my first time contributing to an open-source project. Therefore, I need some guidance. Could you please give me more information about this issue?
@HarunSMetin what exactly isn't clear?
@msmerc May I know why @tuannh982 's CR couldn't be accepted and issue is still open..?
@tanmauec see comment here: https://github.com/google/guava/pull/5779
BTW: I'm just the reporter, I have no say in what gets merged or why!
@chaoren / google folk: I've had a go at solving this, see https://gist.github.com/msmerc/fff91f4049dd4d8e39f3614bb0c58acb
My solution is to generate a comparable such that objects are equal (using the original comparator) it will compare the position in the initial array. Is this something you'd want in the main library? Let me know.
Hi, I am new to open source contribution projects. I see that this issue is still open. I was hoping that I could work on it if it has not already been implemented?
Hi! Can I make a PR?
I'm sorry we slept on this issue a while.
I see no reason NOT to make mergeSorted
stable, and certainly no reason to add a separate stable form of it. Any performance impact would surely be small. Tied elements coming out in a different order might temporarily make a unit test or two fail, but seems unlikely to present a real problem beyond that.
The first PR that would be useful would have test cases that illustrate the problem -- this is unorthodox, but yes, I mean they actually verify that the problem does exist. By default I would assume that we want to find a good variety of these test cases, but if we are convinced that all the cases follow the same simple pattern it might not be necessary. (I'm not even reading the mergeSorted code right now.)
We would pull that and then the next PR could simultaneously make the method stable and change the tests just added accordingly.
I guess we leave it to you all to fight over who gets to do it? Keep it civil now :-)