vespa
vespa copied to clipboard
[Schema streaming mode] Enhence rank calculation for substring search
Is your feature request related to a problem? Please describe.
- Given the schema as:
document test {
field description type string {
indexing: summary | index
match: substring
}
}
- And a document is created with
description=environmental - Then the following 2 search requests
select * from test where description contains 'environment'select * from test where description contains 'env'return the matching doc with exactly the same score/relevance=0.38
Describe the solution you'd like
Considering the sample above, request with search_term=environment should have have a higher score than the request with search_term=env
Isn't it a bug? Vespa's documentation says: "...Streaming search uses the same implementation of most features in Vespa, including ranking, matching and grouping, and supports the same features...". We are working on hybrid search in streaming and we do very rely on the correct ranking. Thanks
Documentation is not perfect. There are a few differences. We are currently trying to reduce the gap. But there will always be some differences. Streaming search have a larger feature set especially related to matching as there we always have the raw text available. substring matching is a feature only available for streaming search. That is why improving the rank here is an enhancement, and not a bug.
We will appreciate if you will be able to prioritize the issue.
@jamesbond7
Vespa index mode doesn't support substring, so you could not match env against environment - so this is obviously an enhancement and not a bug.
Yes, this is a new feature, but one that makes sense. How about creating a separate rank feature ("matchAccuracy"?) that gives the term-weighted average of the closeness of the match of the term to the field? Could also potentially use it with multiple stems.