guava icon indicating copy to clipboard operation
guava copied to clipboard

Make `Iterators.mergeSorted()` stable

Open msmerc opened this issue 2 years ago • 11 comments

I've noticed that Iterators.mergeSorted() is unstable.

So for example if I have arrays: [A_1, B_1, C_1] [A_2, B_2, C_2] (where the comparator looks only at the letter), I'd like the result to iterate [A_1, A_2, B_1, B_2, C_1, C_2]. The current implementation doesn't guarantee this.

Might it be possible to add a stable version?

I confess it's not obvious what to do if the arrays are something like [A_1, A_2, B_1], [A_3] - to me it would make sense to empty the first iterator of A_n elements before moving on to the second, so [A_1, A_2, A_3, B_1].

[I appreciate I'm being a little lazy with my terminology here - I can provide a fully fleshed example if it's not clear]

msmerc avatar Nov 03 '21 18:11 msmerc

Could you provide some examples please?

tuannh982 avatar Nov 09 '21 05:11 tuannh982

@tuannh982: given a really simple data class:

    public static class Datum {
        private final String letter;
        private final int number;

        public Datum(String letter, int number) {
            this.letter = letter;
            this.number = number;
        }

        public String letter() { return letter; }
        public int number() { return number; }

        @Override
        public String toString() {
            return letter + " " + number;
        }
    }

And the following test:

    @Test
    public void sortMe() {
        List<Datum> left = List.of(
                new Datum("B", 1),
                new Datum("C", 1)
        );

        List<Datum> right = List.of(
                new Datum("A", 2),
                new Datum("C", 2)
        );
        
        var comparator = Comparator.comparing(Datum::letter);
        var it = Iterators.mergeSorted(
                List.of(left.iterator(), right.iterator()), comparator);
        it.forEachRemaining(next->System.out.println(next));
    }

I'd get:

A 2
B 1
C 2
C 1

Note how C 2 comes before C 1. If the sort were stable I'd expect to see C 1 before C 2 because that's the order they appear in the input.

Full code here: https://gist.github.com/msmerc/279c32fe6620b0dcf2dc6f2c4dfa1469

msmerc avatar Nov 09 '21 09:11 msmerc

@msmerc: okay, please assign this issue to me.

tuannh982 avatar Nov 09 '21 10:11 tuannh982

@chaoren @msmerc Hello. I would like to work on this issue. But This is my first time contributing to an open-source project. Therefore, I need some guidance. Could you please give me more information about this issue?

HarunSMetin avatar Jan 30 '22 10:01 HarunSMetin

@HarunSMetin what exactly isn't clear?

msmerc avatar Feb 01 '22 09:02 msmerc

@msmerc May I know why @tuannh982 's CR couldn't be accepted and issue is still open..?

tanmayvarun avatar Apr 23 '22 13:04 tanmayvarun

@tanmauec see comment here: https://github.com/google/guava/pull/5779

BTW: I'm just the reporter, I have no say in what gets merged or why!

msmerc avatar Apr 25 '22 09:04 msmerc

@chaoren / google folk: I've had a go at solving this, see https://gist.github.com/msmerc/fff91f4049dd4d8e39f3614bb0c58acb

My solution is to generate a comparable such that objects are equal (using the original comparator) it will compare the position in the initial array. Is this something you'd want in the main library? Let me know.

msmerc avatar May 03 '22 08:05 msmerc

Hi, I am new to open source contribution projects. I see that this issue is still open. I was hoping that I could work on it if it has not already been implemented?

u6885077 avatar Oct 26 '22 05:10 u6885077

Hi! Can I make a PR?

igmtz avatar Apr 24 '23 21:04 igmtz

I'm sorry we slept on this issue a while.

I see no reason NOT to make mergeSorted stable, and certainly no reason to add a separate stable form of it. Any performance impact would surely be small. Tied elements coming out in a different order might temporarily make a unit test or two fail, but seems unlikely to present a real problem beyond that.

The first PR that would be useful would have test cases that illustrate the problem -- this is unorthodox, but yes, I mean they actually verify that the problem does exist. By default I would assume that we want to find a good variety of these test cases, but if we are convinced that all the cases follow the same simple pattern it might not be necessary. (I'm not even reading the mergeSorted code right now.)

We would pull that and then the next PR could simultaneously make the method stable and change the tests just added accordingly.

I guess we leave it to you all to fight over who gets to do it? Keep it civil now :-)

kevinb9n avatar Jun 07 '23 21:06 kevinb9n