assertj icon indicating copy to clipboard operation
assertj copied to clipboard

assertThat(collection).doesNotHaveDuplicates() is not sorted, even if the collection is

Open Bananeweizen opened this issue 1 year ago • 12 comments

The error message of assertThat(collection).doesNotHaveDuplicates() lists the duplicates in a different order than the elements of the collection actually have. That is not an error per-se, but can make it harder to find the root cause of the failing assertion. So this might also be labeled improvement instead of bug.

  • assertj core version: 3.25.1

Test case reproducing the bug

    @Test
    void sortOrder() throws Exception {
        var list = List.of("org.objectweb.asm.source",
                "org.eclipse.emf.compare.diagram.gmf.source.feature.group",
                "org.eclipse.emf.compare.rcp.ui.source.feature.group",
                "org.objectweb.asm",
                "org.eclipse.jgit.feature.group",
                "org.eclipse.emf.compare.diagram.gmf.source.feature.group",
                "org.objectweb.asm",
                "org.apache.httpcomponents.httpclient",
                "org.eclipse.emf.compare.rcp.ui.source.feature.group",
                "org.eclipse.jgit.feature.group",
                "org.eclipse.emf.compare.ide.ui.source.feature.group");
        assertThat(list.stream().sorted()).doesNotHaveDuplicates();
    }

Note that the list gets sorted in the assertion. Still the output is not sorted, because the duplicates are collected in a set which is not insertion order stable.

Found duplicate(s):
  ["org.eclipse.emf.compare.diagram.gmf.source.feature.group",
    "org.eclipse.emf.compare.rcp.ui.source.feature.group",
    "org.objectweb.asm",
    "org.eclipse.jgit.feature.group"]
in:
  ["org.apache.httpcomponents.httpclient",
    "org.eclipse.emf.compare.diagram.gmf.source.feature.group",
    "org.eclipse.emf.compare.diagram.gmf.source.feature.group",
    "org.eclipse.emf.compare.ide.ui.source.feature.group",
    "org.eclipse.emf.compare.rcp.ui.source.feature.group",
    "org.eclipse.emf.compare.rcp.ui.source.feature.group",
    "org.eclipse.jgit.feature.group",
    "org.eclipse.jgit.feature.group",
    "org.objectweb.asm",
    "org.objectweb.asm",
    "org.objectweb.asm.source"]

Bananeweizen avatar Jan 10 '24 11:01 Bananeweizen

Fair enough, if I understand this correctly the fix should consist of using a linked hash set to store the duplicates.

@Bananeweizen do you want to contribute this one?

joel-costigliola avatar Jan 10 '24 14:01 joel-costigliola

Yes, I can do that. I still have to understand if the currently used comparator has some useful meaning for another (unknown to me) aspect. If not, I'd also expect this to be a trivial replacement of the TreeSet by a LinkedHashSet.

Bananeweizen avatar Jan 10 '24 14:01 Bananeweizen

If memory serves, I think we used a treeset to have consistent ordering in our tests, this can be achieved better with a linked hash set though.

joel-costigliola avatar Jan 10 '24 14:01 joel-costigliola

@Bananeweizen can I take this for my first contribution, would help me in getting familiar with the PR process

ranjitshinde91 avatar Jan 10 '24 15:01 ranjitshinde91

sure

Bananeweizen avatar Jan 10 '24 15:01 Bananeweizen

TreeSet with custom comparator is used to compare objects in Set collections. Hence simply changing TreeSet to LinkedHashSet does not work.

Added fix as part of this PR https://github.com/assertj/assertj/pull/3333

ranjitshinde91 avatar Jan 10 '24 19:01 ranjitshinde91

@Bananeweizen quick question regarding the expected ordering. Given the following input which is found in the test ["Merry", "Frodo", null, null, "Merry", "Sam", "Frodo"], there are two possible ways to return the duplicates in order:

  1. ["Merry", "Frodo", null]
  2. [null, "Merry", "Frodo"] Option 1 starts with the first elements that is a duplicate, while the second starts with the first duplicated element. I think they are both valid in their own way, like for debugging purpose, option 2 tells me null is the first duplicated element, which I think is helpful, while option one seems like the natural way. I know @ranjitshinde91 implementation is based on 1. above, checking if option 2 can be considered.

kulgan avatar Jan 13 '24 23:01 kulgan

Quite honestly, I had not even thought about that until you asked. I think I would be fine with both. The question of which one would be more nice for setting a breakpoint afterwards also depends, on whether I as the developer consider the first duplicate wrong (e.g. the first Merry should not have been added to begin with), or the second one. More typical might be to consider the second duplicate wrong, so I can understand that you would eventually prefer option 2. As I said, I would be fine with either variant.

Bananeweizen avatar Jan 14 '24 11:01 Bananeweizen

@kulgan good point, I agree with @Bananeweizen that there isn't much difference between the two options, so I would go with the simplest implementation on this one

joel-costigliola avatar Jan 14 '24 12:01 joel-costigliola

I agree with @Bananeweizen when I started implementation i considered second Merry to be wrong and added in duplicates. Also 2nd Implementation is simpler and does not require second traversal of Interable done in private method.

I will make the necessary changes to switch to option 2, if that's okay ?

ranjitshinde91 avatar Jan 14 '24 15:01 ranjitshinde91

Go for it @ranjitshinde91, thanks!

joel-costigliola avatar Jan 14 '24 15:01 joel-costigliola

updated the same PR with these changes

ranjitshinde91 avatar Jan 14 '24 15:01 ranjitshinde91