solr icon indicating copy to clipboard operation
solr copied to clipboard

SOLR-5707: Lucene Expressions in Solr

Open risdenk opened this issue 3 years ago • 5 comments

https://issues.apache.org/jira/browse/SOLR-5707

This is me taking Hoss's VSP patch from https://issues.apache.org/jira/browse/SOLR-5707 and trying to look at it again.

  • [x] ./gradlew check -x test -Pvalidation.errorprone=true -Pvalidation.sourcePatterns.failOnError=false passes :)
  • [ ] nocommit are still in there
  • [ ] do we NEED to deal w/ score? can we do score different now that its been years later
  • [ ] does ./gradlew -p solr/core test --tests ExpressionValueSourceParserTest pass?
  • [ ] do all tests pass?
  • [ ] how does this perform - this is my real goal here compared to other say boostfunctions

risdenk avatar Dec 16 '22 15:12 risdenk

Not looking at this anymore

risdenk avatar Oct 30 '24 19:10 risdenk

I commented in JIRA about my interest in furthering this. I'm sure there are some nice-to-have's (score function access, documentation), but simply working code & tested is enough to merge. I can brush the dust off get this PR mergeable; what do you say @risdenk ? This functionality plugs in nicely, showcasing Solr's great abstractions without needing to hack on any Solr plumbing. I would later expand the usage to support access to text fields via a DocTransformer use case but I think another PR can do that on top of the fine work already done here.

dsmiley avatar May 01 '25 20:05 dsmiley

Go for it @dsmiley I ended up not needing it but it did work when I was trying it

risdenk avatar May 01 '25 20:05 risdenk

Updating the PR to main was easy. Note that there is another PR #3340 that will overlap with this one which makes it possible for a DocTransformer to access the score. The failing tests relate to sorting by an expression that makes use of the score. If we mark those tests with an Ignore and add a follow-up to improve that matter, then I think we have something mergeable. Progress not perfection; right?

Addressing the score issue is complicated by the fact that Lucene's ValueSource API is legacy, has no concept of needsScores, but the replacement DoubleValuesSource does have this concept. Solr is still ValueSource based. I could see doing a switch for Solr 10 but that's for another issue and dev list discussion.

dsmiley avatar May 06 '25 17:05 dsmiley

The ability to reference scores in a custom ValueSource based on DoubleValuesSource won't work with sorting. I filed this PR https://github.com/apache/lucene/pull/14654 to Lucene to trivially fix that issue. But since Lucene 9.x is EOL; I don't know when we might expect that if ever in Solr 9.

dsmiley avatar May 13 '25 04:05 dsmiley

@hossman as the original author, maybe you'd like to review this?

dsmiley avatar Jul 07 '25 12:07 dsmiley

I cherry-picked the Lucene 9.12.2 upgrade (thanks @HoustonPutman ), removed the Ignore annotation from the 2 tests, fixed a minor test bug that one Ignored test was hiding, and confirmed that this Lucene upgrade addressed the lingering concern :-) I didn't push the Lucene upgrade commit so as to not confuse this PR, and thus those 2 tests will fail.

@houston not sure if you have concerns on this getting into 9.9.

dsmiley avatar Jul 08 '25 02:07 dsmiley

@AndreyBozhko you might be a good reviewer here

dsmiley avatar Jul 09 '25 05:07 dsmiley

making sure functions can access scores in distributed queries,

What would go wrong if this functionality were used in such? An aside: I have a wish for our tests to be able to run in both modes easily. I've used that strategy at work. The test here is based on our oldest test infrastructure, and that which wouldn't easily support that. I'd like to see further use of the new SolrClientTestRule such as using SolrCloud; there's a JIRA for that.

exposing score to functions that act as post-filters.

I saw that one-liner you added for frange specifically (which I wish wasn't a PostFilter, but I digress). Is there a more general adaptation?

Do you have concerns with the merge-ability of this PR as it stands for 9.9?

dsmiley avatar Jul 10 '25 21:07 dsmiley

DocList / DocIterator / DocIterationInfo != a Lucene Scorable (which is an abstract class, BTW; limits creative class hierarchy possibilities). Furthermore the former is an iterator but the latter returns the current state. These things aren't compatible.

@HoustonPutman I don't want to push too last minute for this in 9.9 unless you are comfortable with this as the RM. I think this is a great feature that has waited over 11 years too long.

dsmiley avatar Jul 12 '25 02:07 dsmiley

@dsmiley I just started producing the 9.9.0 release, so let's leave this for 9.10. We don't have to wait months for 9.10 either, we can do it in a month or two and get this out after baking for a little bit.

HoustonPutman avatar Jul 12 '25 02:07 HoustonPutman

Actually I’m having issues with the smoketester erroring in the tests, so if you can get this in by Monday and are sure it wont break the tests, go for it. It seems like a new feature, so im not too worried about it breaking anything other than the tests.

HoustonPutman avatar Jul 13 '25 05:07 HoustonPutman

Yep I'm confident in it. It's all merged.

dsmiley avatar Jul 14 '25 03:07 dsmiley