hibernate-search icon indicating copy to clipboard operation
hibernate-search copied to clipboard

HSEARCH-4601 Syntactic sugar for OR/AND predicates

Open YannisBres opened this issue 3 years ago • 13 comments

https://hibernate.atlassian.net/browse/HSEARCH-4601

Hi!

So, I went back to this PR and I hope I got it right this time after my initial botching.

I have added support for basic n-ary or(...) / and(...) constructs à la

f.or( f.match().field("foo").matching("val1"),
      f.match().field("bar").matching("val2") )
f.and( f.match().field("foo").matching("val1"),
       f.match().field("bar").matching("val2") )

I haven't yet added builder-style constructs with .add(...) and with(...). I guess that should be done.

I haven't updated the documentation yet either.

Before I take the next step, is this going in the right direction? ;-)

I have added new tests in BoolPredicateSpecificsIT, as they are more or less cloned from bool() ones. This makes the file name not very relevant... Shall I add a new IT file of its own, at the risk of more duplicated code? If not, shall "duplicated" methods be grouped further, e.g. in @Nested test classes? Also, aren't the tests in SmokeIT a bit redundant/pointless? Maybe they should not be added there...

So far, AndPredicateClausesStep and OrPredicateClausesStep don't implement PredicateScoreStep, which would allow calling boost(...) and constantScore() on the overall filter. Shouldn't they?

Similarly, so far, OrPredicateClausesStep doesn't support any of the minimumShouldMatch constraints. Shall it? That looks to me a bit far away from the expected semantic of an OR (with minimumShouldMatch == 1), so maybe such a feature should be reserved to more complex bool()-based predicates.

(If nothing should be added, the Javadoc "@return The initial step of a DSL" should be revised.)

Shall something be done to support mustNot? Again, I'd leave that to bool()...

Shall SearchPredicateFactory.and(...) and or(...) be tagged as @Incubating at first?

As usual, using varargs with a parameterized type requires a lot of @SuppresssWarnings("unchecked"), including in calling code... :'-( Would you happen to have anything better to suggest? (Besides n overloads for all possible arities. ;-))

Best regards, Yannis

YannisBres avatar Jul 22 '22 11:07 YannisBres

Hi !

I guess we agree on everything: thanks for the constructive and professional comments!

I'm a bit sad for the and( Function<...>... ) (and or) overloads, but of course I can't judge [yet?] how not that relevant they are.

DONE:

  • Moved new tests in dedicated AndPredicateSpecificsIT and OrPredicateSpecificsIT, instead of "polluting" BoolPredicateSpecificsIT. Could you please confirm that duplicated code (around test data initialization) is accepted with all its consequences? Don't we want to share something somewhere (with, conversely, the consequences of stronger coupling)?
  • Removed the tests added to SmokeIT, as this file has been deleted in another PR.
  • Corrected non-updated copy/paste in Javadoc of SearchPredicateFactory.

TODO:

  • [ ] 2 code smells reported by Sonar: S is indeed not used in AndPredicateClausesStep and OrPredicateClausesStep. To be removed if chaining / a self() method with covariant return type is indeed unnecessary (but I guess it will be necessary).
  • [ ] AndPredicateClausesStep and OrPredicateClausesStep should implement PredicateScoreStep in order to support boost(...) and constantScore().
  • [ ] Rename AndPredicateClausesStep to something like AndPredicateOptionsStep (see ExistsPredicateOptionsStep and MatchPredicateOptionsStep).
  • [ ] Ditto for OrPredicateClausesStep.
  • [ ] Remove the overload AbstractSearchPredicateFactory.and( Function<...>... ) + or( Function<...>... ).
  • [ ] Clean up copy-pasted code that ended up being unnecessary in AndPredicateSpecificsIT and OrPredicateSpecificsIT.

Carried-over TODOs:

  • [ ] Add builder-style constructs with .add(...) and with(...).
  • [ ] Documentation

YannisBres avatar Jul 27 '22 15:07 YannisBres

Could you please confirm that duplicated code (around test data initialization) is accepted with all its consequences? Don't we want to share something somewhere (with, conversely, the consequences of stronger coupling)?

I sense you're probing for how well we apply best practices, so I'll detail a bit.

We avoid duplicating code.

In the main code, the only parts where we do duplicate code is APIs, when we have to (multiple Maven modules, each with its own API but with subtle differences).

In tests, and actually made a great effort to avoid code duplication. Just have a look at MatchPredicateBaseIT, RangePredicateBaseIT, etc. A great many tests are shared across multiple predicates, but as you can see, it becomes much harder to understand what's going on in a particular test. For instance, you have to read through three different classes to get a complete view of what happens exactly in org.hibernate.search.integrationtest.backend.tck.search.predicate.MatchPredicateBaseIT.AnalysisIT#multiIndex_incompatibleAnalyzer.

So, really, as with most things, it's a trade-off. Avoiding duplication is fine but if it turns the code into a write-only, overly complex mess that external contributors cannot understand anymore (and frankly we're getting close...)... then it's not worth it.

I also generally accept higher complexity for the main code than for tests, because, well, can you trust tests that are more complex than what you're testing?

If you can find a way to "factorize" the duplicated code, have one test class for bool, one for or, and one for and, while keeping the code clear... then sure, I'm all for it. I just don't think it will be easy.

2 code smells reported by Sonar: S is indeed not used in AndPredicateClausesStep and OrPredicateClausesStep. To be removed if chaining / a self() method with covariant return type is indeed unnecessary (but I guess it will be necessary).

This is about backwards compatibility.

Indeed, we don't need the S parameter right now. But if we add methods to AndPredicateClausesStep and OrPredicateClausesStep, then we might need it. And adding the S parameter in a later version may break backwards compatibility for our users: if their code refers to AndPredicateClausesStep explicitly (ideally they shouldn't have to, but well, shit happens) at best they'll get raw type warnings, at worst their code won't compile anymore.

Anyway, once you add support for constantScore/boost by extending PredicateScoreStep, you'll see the S parameter becomes necessary, so you won't have this problem anymore :)


Regarding the code, I'll do another review when it's complete, unless you have specific questions ;)

yrodiere avatar Jul 28 '22 10:07 yrodiere

Thanks for the clarifications! I guess we agree that [non-]duplicated code is always a trade-off, and I do agree on the reasoning that tests should be kept as simple as possible in order to avoid complexity-induced side-effects that may hamper their initial purpose.

Speaking about duplicated code (and the corresponing trade-offs), implementations of PredicateScoreStep are very similar in ExistsPredicateFieldStepImpl, CommonState, MatchIdPredicateMatchingStepImpl, AbstractBooleanPredicateClausesStep, and now in AndPredicateOptionsStepImpl and OrPredicateOptionsStepImpl... I'm wondering whether it could be relevant to introduce some intermediate abstract class below AbstractPredicateFinalStep in order to factorize such concerns. That would probably imply to store the builder at that level (=> getter or... protected final member [non-]variable?), to add yet another parameterized type for it and to generalize a covariant self() method... (But I'll consider this as beyond the scope of this PR.)

(+1 also for the S parameterized type: that's why I kept it!)

YannisBres avatar Jul 28 '22 15:07 YannisBres

DONE:

  • Renamed AndPredicateClausesStep to AndPredicateOptionsStep and OrPredicateClausesStep to OrPredicateOptionsStep.
  • AndPredicateOptionsStep and OrPredicateOptionsStep now implement PredicateScoreStep. => Cloned and adapted BoolPredicateBaseIT to AndPredicateBaseIT and OrPredicateBaseIT.

Carried-over TODOs:

  • [ ] Add builder-style constructs with .add(...) and with(...).
  • [ ] Remove the overload AbstractSearchPredicateFactory.and( Function<...>... ) + or( Function<...>... ).
  • [ ] Clean up copy-pasted code that ended up being unnecessary in AndPredicateSpecificsIT and OrPredicateSpecificsIT.
  • [ ] Document the new feature.

YannisBres avatar Jul 28 '22 15:07 YannisBres

I'm wondering whether it could be relevant to introduce some intermediate abstract class below AbstractPredicateFinalStep in order to factorize such concerns. That would probably imply to store the builder at that level (=> getter or... protected final member [non-]variable?), to add yet another parameterized type for it and to generalize a covariant self() method...

That's a lot of effort for two (very) simple methods, but sure, feel free to create a ticket (a Task). It's probably going to be non-trivial, though; see org.hibernate.search.engine.search.predicate.dsl.impl.AbstractBooleanMultiFieldPredicateCommonState#applyBoostAndConstantScore.

And yes, renaming thisAsS to self across the board would be a good idea as well.

yrodiere avatar Jul 29 '22 08:07 yrodiere

Also:

to add yet another parameterized type for it

I don't think that's necessary, but we shall see.

yrodiere avatar Jul 29 '22 08:07 yrodiere

DONE:

  • Added builder-style constructs with .add(...) and with(...). add(...) supports both PredicateFinalStep and SearchPredicate (in order to support previously-built predicates): did that make sense?
  • Removed the AbstractSearchPredicateFactory.and( Function<...>... ) + or( Function<...>... ) overloads.

Carried-over TODOs:

  • [ ] Document the new feature.

YannisBres avatar Aug 01 '22 10:08 YannisBres

TODO :

  • [ ] In add(...) methods, add support for Function<? super SearchPredicateFactory, ? extends PredicateFinalStep>.
  • [ ] Make so that using add(...) and with() is not "allowed" from the and(...)/or(...) "constructors" that already allow to specify some predicates. (Cf. https://github.com/hibernate/hibernate-search/pull/3152#discussion_r934413732.)
  • [ ] Prevent support of constructs à la f.and().with( c -> c.toPredicate() ).add( ... ). (Cf. https://github.com/hibernate/hibernate-search/pull/3152#discussion_r934420835.)

Carried-over TODOs:

  • [ ] Document the new feature.

YannisBres avatar Aug 01 '22 14:08 YannisBres

DONE :

  • In add(...) methods, add support for Function<? super SearchPredicateFactory, ? extends PredicateFinalStep>.
  • Merged the two class hierarchies for and and or in a single one, as only one method call actually changes and interface/classes names don't seem very relevant (+ support for mustNot is coming)...
  • Reworked the class hierarchy so that:
    • add(...) and with() are no longer accessible from the and(...)/or(...) "constructors" that allow to specify some predicates, but only from the parameterless "constructors".
    • In consumers, toPredicate() is no longer accessible.
    • As such, the following code doesn't compile: image

FTR, the class hierarchy now looks like this (I wish I had IntelliJ Ultimate...): SimplePredicate

Carried-over TODOs:

  • [ ] Document the new predicates.

YannisBres avatar Aug 08 '22 01:08 YannisBres

  • support for mustNot is coming

I'd say that PR is already more than enough work, let's not bother right now ;)

That's a separate ticket anyway: https://hibernate.atlassian.net/browse/HSEARCH-4645

yrodiere avatar Aug 08 '22 06:08 yrodiere

FTR, the build failure was due to an attempt to address an IntelliJ warning, which in turn triggered a Javadoc warning... ;-) Will undo!

YannisBres avatar Aug 08 '22 13:08 YannisBres

DONE:

  • Renamed everything to ~*SimpleBooleanOperatorPredicate*.

Carried-over TODOs:

  • [ ] Document the new predicates.

YannisBres avatar Aug 08 '22 15:08 YannisBres

DONE:

  • First iteration on the documentation:
    • Inserted an "and: combine predicates as a conjunction" and an "or: combine predicates as a disjunction" section before the initial "bool: combine predicates (or/and/...)" section, renamed as "bool: advanced combinations of predicates (or/and/…​)". (Intros sound a bit verbose to me: WDYT?)
    • Replaced the examples for pure and and or predicates with "[...] using the simpler and/or syntax is recommended" and corresponding references.

TODO:

  • [ ] Links to the initial bool section should be augmented with links to and and or as appropriate.
  • [ ] Some examples that only use AND/OR pattern should be simplified with the simpler syntax: OK?
  • [ ] There should be a section "Adding clauses dynamically with the lambda syntax" for the and predicate (and a reference to it in order to avoid repetitions in the or section) in order to introduce the add and with syntaxes. However, I'm wondering whether we'd need specific "overloads" of SearchQueryWhereStep.where(...) for that purpose, that clearly state if we want an and or an or here... WDYT?

Any other comments?

Here is the generated PDF for easier review: hibernate_search_reference.pdf.

YannisBres avatar Aug 15 '22 23:08 YannisBres

DONE:

  • Addressed review comments.
  • In the new and and or sections, added "Adding clauses dynamically with the lambda syntax" subsections. Note that the "cascaded with" syntax is only explained in the or section, as the point is mostly to combine two different operators (=> these two different operators must have been introduced first).
  • Simplified the examples that used pure bool()-based AND/OR (see attached PDF for easier review):
    • Example 65. Searching with a flattened structure
    • Example 146. Declaring a named predicate
    • Skipped Example 243. Using implicit nesting as it would be confusing to switch to the simplified syntax as this example is a variation of "Example 242. Matching multiple predicates against a single nested object".
    • Example 258. Boosting on a per-predicate basis
    • Example 260. Making the score of a predicate constant
    • Example 342. Targeting a field using relative paths

Here is the generated PDF for easier review: hibernate_search_reference.pdf.

And now, by reading the examples with lambdas, i.e. when the operator instantiation and the add operations are distant from one another, I'm wondering if such syntax isn't requiring an excessive cognitive load for casual readers to figure out what kind of operation this add is performing. I'd say the cognitive load would be reduced if the methods were to be more explicitly named and and or. For the forthcoming not predicate (HSEARCH-4645), renaming add to something like andNot may also be more explicit. In addition to making code more self-explanatory, having specific "add" methods would be less error-prone when different predicates are used in different with(...) constructs, as in "Example 228. Easily adding clauses dynamically using with(…) and the lambda syntax". Note that doing so would totally mess the current class hierarchy, which is quite operator-agnostic as it is... ;-)

YannisBres avatar Aug 19 '22 10:08 YannisBres

Thanks for your pull request!

This pull request appears to follow the contribution rules.

› This message was automatically generated.

I rebased, applied my suggestions and force-pushed to your branch. Let's see what CI has to say...

yrodiere avatar Aug 22 '22 12:08 yrodiere