neural-search icon indicating copy to clipboard operation
neural-search copied to clipboard

[FEATURE] Hybrid search and collapse compatibility

Open qmauret opened this issue 1 year ago • 11 comments

Describe the bug

Using collapse feature in an hybrid search did not collapse documents.

Related component

Search

To Reproduce

I’m trying to combine hybrid search (semantic + keyword) with collapse feature to deduplicate products from same visual.

I have tried collapsed search on a basic search, which works great.

With hybrid search, the behaviour is a bit different. It places products from the same visual in the inner_hits field but did not collapse them (they are still present in the root level of the search results) which is not the expected behaviour.

Anyone’s aware of a problem of compatibility between hybrid and collapse ?

Expected behavior

I expect the same behaviour as performing a collapse on non hybrid search

Additional Details

Host/Environment (please complete the following information):

  • OS: AWS
  • Version : 2.11

Additional context Basic search with collapse (working as expected) :

GET /product_1/_search
{
“_source”: {
“includes”: [“_id”, “name”, “category_name”, “visual.id_visual”]
},
“query”: {
“match”: {
“name”: {
“query”: “Ski”
}
}
},
“collapse”: {
“field”: “visual.id_visual”,
“inner_hits”: {
“size”: 1,
“name”: “from_same_visual”,
“sort”: [
{
“_score”: “desc”
}
]
}
}
}

Hybrid search with collapse (not working) :

GET /product_1/_search?search_pipeline=search_pipeline
{
“_source”: {
“includes”: [“_id”, “name”, “category_name”, “visual.id_visual”]
},
“query”: {
“hybrid”: {
“queries”: [
{
“neural”: {
“fullname_v”: {
“query_text”: “Ski”,
“model_id”: “xxx”,
“k”: 200
}
}
},
{
“multi_match”: {
“query”: “Ski”,
“type”: “most_fields”,
“fields”: [“category.name^2”, “name^4”, “tags.name^3”],
“fuzziness”: “AUTO”,
“prefix_length”: 0,
“max_expansions”: 10
}
}
]
}
},
“collapse”: {
“field”: “visual.id_visual”,
“inner_hits”: {
“size”: 1,
“name”: “from_same_visual”,
“sort”: [
{
“_score”: “desc”
}
]
}
}
}

qmauret avatar Mar 29 '24 13:03 qmauret

[Triage - attendees 1 2 3 4 5 6 7 8] @opensearch-project/admin Could you transfer this to the neural search repository, this seems related to its functionality.

peternied avatar Apr 03 '24 15:04 peternied

@qmauret functionality of collapse is not supported by the hybrid query. Team will look into the feasibility of adding it.

martin-gaievski avatar Apr 30 '24 23:04 martin-gaievski

Hi, I'm having the same issue, for certain customers with products + product variants, it is ugly to have the same result repeated sometimes (eg: for the product size so the image is the same)

Would be nice to have this feature for hybrid queries :+1:

sonic182 avatar Aug 29 '24 15:08 sonic182

cc: @minalsha , @vamshin.

One more customer request: https://opensearch.slack.com/archives/C0539F41Z5X/p1722335776347039

navneet1v avatar Sep 14 '24 05:09 navneet1v

one more customer ask for this feature: https://opensearch.slack.com/archives/C05RCMNQY8N/p1734071135199889

martin-gaievski avatar Dec 16 '24 21:12 martin-gaievski

I also encountered the unexpected behavior that collapse doesn't work with hybrid queries, but I've found the oversample/collapse/truncate search pipeline pattern does work.

david-albrecht-xometry avatar Feb 18 '25 19:02 david-albrecht-xometry

Hi, Encountered the same issue. It’s affecting my workflow, and I’m eager to know if there are any updates or possible workarounds. Checked what David had suggested but looks like it will not consider the overall result set.

I also encountered the unexpected behavior that collapse doesn't work with hybrid queries, but I've found the oversample/collapse/truncate search pipeline pattern does work.

Looking forward to any insights from the team—thanks for your efforts!

ankitas3 avatar Feb 19 '25 06:02 ankitas3

Collapse is currently not supported for hybrid query, but we are working to implement it. From my understanding, the collapse processor works with hybrid query. You can attach the processor to a search pipeline that is configured for using hybrid query.

Here is a link to the documentation for that processor: https://opensearch.org/docs/latest/search-plugins/search-pipelines/collapse-processor/

ryanbogan avatar Feb 20 '25 01:02 ryanbogan

Hi @ryanbogan, I did check that but as it is a search response processor, it collapses only the results returned from hybrid query. Whereas, the expectation is to collapse all results specific to a search query irrespective of size parameter.

ankitas3 avatar Feb 20 '25 04:02 ankitas3

@ankitas3 You are correct, that is a limitation of using the collapse processor. We are working to support adding collapse to a hybrid query, but it is currently not supported. In order to get more results considered, the best option currently available is to use the oversample and collapse combination linked above.

ryanbogan avatar Feb 20 '25 19:02 ryanbogan

I also encountered the unexpected behavior that collapse doesn't work with hybrid queries, but I've found the oversample/collapse/truncate search pipeline pattern does work.

Worth to mention: in my case I had to specify the same collapse in both the search pipeline as well as the search request. Otherwise when using the search pipeline, it would collapse to a single result, only!

AmazingTurtle avatar Mar 03 '25 18:03 AmazingTurtle