neural-search
neural-search copied to clipboard
[FEATURE] Hybrid search and collapse compatibility
Describe the bug
Using collapse feature in an hybrid search did not collapse documents.
Related component
Search
To Reproduce
I’m trying to combine hybrid search (semantic + keyword) with collapse feature to deduplicate products from same visual.
I have tried collapsed search on a basic search, which works great.
With hybrid search, the behaviour is a bit different. It places products from the same visual in the inner_hits field but did not collapse them (they are still present in the root level of the search results) which is not the expected behaviour.
Anyone’s aware of a problem of compatibility between hybrid and collapse ?
Expected behavior
I expect the same behaviour as performing a collapse on non hybrid search
Additional Details
Host/Environment (please complete the following information):
- OS: AWS
- Version : 2.11
Additional context Basic search with collapse (working as expected) :
GET /product_1/_search
{
“_source”: {
“includes”: [“_id”, “name”, “category_name”, “visual.id_visual”]
},
“query”: {
“match”: {
“name”: {
“query”: “Ski”
}
}
},
“collapse”: {
“field”: “visual.id_visual”,
“inner_hits”: {
“size”: 1,
“name”: “from_same_visual”,
“sort”: [
{
“_score”: “desc”
}
]
}
}
}
Hybrid search with collapse (not working) :
GET /product_1/_search?search_pipeline=search_pipeline
{
“_source”: {
“includes”: [“_id”, “name”, “category_name”, “visual.id_visual”]
},
“query”: {
“hybrid”: {
“queries”: [
{
“neural”: {
“fullname_v”: {
“query_text”: “Ski”,
“model_id”: “xxx”,
“k”: 200
}
}
},
{
“multi_match”: {
“query”: “Ski”,
“type”: “most_fields”,
“fields”: [“category.name^2”, “name^4”, “tags.name^3”],
“fuzziness”: “AUTO”,
“prefix_length”: 0,
“max_expansions”: 10
}
}
]
}
},
“collapse”: {
“field”: “visual.id_visual”,
“inner_hits”: {
“size”: 1,
“name”: “from_same_visual”,
“sort”: [
{
“_score”: “desc”
}
]
}
}
}
[Triage - attendees 1 2 3 4 5 6 7 8] @opensearch-project/admin Could you transfer this to the neural search repository, this seems related to its functionality.
@qmauret functionality of collapse is not supported by the hybrid query. Team will look into the feasibility of adding it.
Hi, I'm having the same issue, for certain customers with products + product variants, it is ugly to have the same result repeated sometimes (eg: for the product size so the image is the same)
Would be nice to have this feature for hybrid queries :+1:
cc: @minalsha , @vamshin.
One more customer request: https://opensearch.slack.com/archives/C0539F41Z5X/p1722335776347039
one more customer ask for this feature: https://opensearch.slack.com/archives/C05RCMNQY8N/p1734071135199889
I also encountered the unexpected behavior that collapse doesn't work with hybrid queries, but I've found the oversample/collapse/truncate search pipeline pattern does work.
Hi, Encountered the same issue. It’s affecting my workflow, and I’m eager to know if there are any updates or possible workarounds. Checked what David had suggested but looks like it will not consider the overall result set.
I also encountered the unexpected behavior that
collapsedoesn't work with hybrid queries, but I've found the oversample/collapse/truncate search pipeline pattern does work.
Looking forward to any insights from the team—thanks for your efforts!
Collapse is currently not supported for hybrid query, but we are working to implement it. From my understanding, the collapse processor works with hybrid query. You can attach the processor to a search pipeline that is configured for using hybrid query.
Here is a link to the documentation for that processor: https://opensearch.org/docs/latest/search-plugins/search-pipelines/collapse-processor/
Hi @ryanbogan, I did check that but as it is a search response processor, it collapses only the results returned from hybrid query. Whereas, the expectation is to collapse all results specific to a search query irrespective of size parameter.
@ankitas3 You are correct, that is a limitation of using the collapse processor. We are working to support adding collapse to a hybrid query, but it is currently not supported. In order to get more results considered, the best option currently available is to use the oversample and collapse combination linked above.
I also encountered the unexpected behavior that
collapsedoesn't work with hybrid queries, but I've found the oversample/collapse/truncate search pipeline pattern does work.
Worth to mention: in my case I had to specify the same collapse in both the search pipeline as well as the search request. Otherwise when using the search pipeline, it would collapse to a single result, only!