Ensure stability of clause order for DisjunctionMaxQuery toString
Since https://github.com/apache/lucene/pull/110, the disjuncts elements of DisjunctionMaxQueries don't have an order anymore, which is impacting the toString method. In isolation, that does not matter. But, in Solr, when the debug component is needed for a distributed query, every shard can return a different toString representation of the same query... and the different toString keys of the debug response will have an array value, containing those different representations (instead of having one value for one same representation).
Example with the parsedquery_toString key (of a json response within Solr):
parsedquery_toString":["((docIdentifiers:\"Okarandeep Osingh\" docIdentifiers:Otest) | (docTitle:\"Okarandeep Osingh\" docTitle:Otest) | (docBody:\"Okarandeep Osingh\" docBody:Otest))","((docBody:\"Okarandeep Osingh\" docBody:Otest) | (docTitle:\"Okarandeep Osingh\" docTitle:Otest) | (docIdentifiers:\"Okarandeep Osingh\" docIdentifiers:Otest))"]
When PR110 was merged, Solr adapted its unit tests this way: https://github.com/apache/solr/pull/117 but, later on within Lucene, the toString method of DisjuctionIntervalsSource was adapted in prevision of a potential similar future change: https://github.com/apache/lucene/pull/193.
I adapted the toString method of DisjunctionMaxQueries similarly to this PR.
Thanks for the feedback. Looking at BooleanQuery, it "only" has one list List<BooleanClause> clauses. So, is the idea to have 2 structures for the DisjunctionMaxQuery, the unordered multiset of queries and a sorted list of queries, where the latter is only used for the toString method ?
See BooleanQuery#clauseSets, which is used for equals()/hashcode() and BooleanQuery#clauses, which is used for toString().
I needed to sort the Querys in some ways, so I compare them according to their toString representation:
orderedQueries.sort(Comparator.comparing(Query::toString));
Not sure if it's the right way.
I wouldn't sort them, and just rely on the order that the caller supplied?
Ha, I see. Could we say that the new List<Query> orderedQueries would have the same behavior that Query[] disjuncts before https://github.com/apache/lucene/pull/110/files ? If yes, I presume it would work.
Yes, exactly.
Can you add an entry to lucene/CHANGES.txt under version 10.1.0? Then I'll merge.
Done. Thanks for reviewing!
Thanks for this! I had to redo a bunch of tests over this matter at work.