neural-search
neural-search copied to clipboard
[BUG] IndexOutOfBoundsException in Hybrid search for some queries only
What is the bug?
I'm using Hybrid search in Opensearch version 2.11, and I'm getting the following error in some queries:
{
"error": {
"root_cause": [
{
"type": "index_out_of_bounds_exception",
"reason": null
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "index-name",
"node": "2aLNPmEjQ8OYCuCFyEyI-Q",
"reason": {
"type": "index_out_of_bounds_exception",
"reason": null
}
}
],
"caused_by": {
"type": "index_out_of_bounds_exception",
"reason": null,
"caused_by": {
"type": "index_out_of_bounds_exception",
"reason": null
}
}
},
"status": 500
}
I get these logs:
2023-11-21 16:53:16 org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
2023-11-21 16:53:16 at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:706) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:379) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:745) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:503) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:301) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:755) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.transport.TransportService$6.handleException(TransportService.java:903) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1526) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1640) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1614) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:80) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.transport.TransportChannel.sendErrorResponse(TransportChannel.java:72) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:70) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:104) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
2023-11-21 16:53:16 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
2023-11-21 16:53:16 at java.lang.Thread.run(Thread.java:833) [?:?]
2023-11-21 16:53:16 Caused by: org.opensearch.OpenSearchException$3
2023-11-21 16:53:16 at org.opensearch.OpenSearchException.guessRootCauses(OpenSearchException.java:708) ~[opensearch-core-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:377) [opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 ... 23 more
2023-11-21 16:53:16 Caused by: java.lang.IndexOutOfBoundsException
2023-11-21 16:53:16 at java.nio.Buffer.checkIndex(Buffer.java:743) ~[?:?]
2023-11-21 16:53:16 at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:339) ~[?:?]
2023-11-21 16:53:16 at org.apache.lucene.store.ByteBufferGuard.getByte(ByteBufferGuard.java:115) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.readByte(ByteBufferIndexInput.java:564) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$8.longValue(Lucene90NormsProducer.java:443) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.apache.lucene.search.LeafSimScorer.getNormValue(LeafSimScorer.java:47) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.apache.lucene.search.LeafSimScorer.score(LeafSimScorer.java:60) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.apache.lucene.search.TermScorer.score(TermScorer.java:75) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.apache.lucene.search.DisjunctionSumScorer.score(DisjunctionSumScorer.java:41) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:193) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:61) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:193) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:61) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:193) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.apache.lucene.search.ReqOptSumScorer.score(ReqOptSumScorer.java:273) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.opensearch.neuralsearch.query.HybridQueryScorer.score(HybridQueryScorer.java:64) ~[?:?]
2023-11-21 16:53:16 at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:61) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:61) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.opensearch.common.lucene.MinimumScoreCollector.collect(MinimumScoreCollector.java:78) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:274) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:254) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.opensearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:71) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:322) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:281) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:551) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
2023-11-21 16:53:16 at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:354) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:441) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:425) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.search.query.QueryPhaseSearcherWrapper.searchWith(QueryPhaseSearcherWrapper.java:65) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:66) ~[?:?]
2023-11-21 16:53:16 at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:280) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:153) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:533) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:597) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:566) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.11.0.jar:2.11.0]
2023-11-21 16:53:16 ... 8 more
How can one reproduce the bug?
Honestly, it's very hard to reproduce the bug. As I'm using my company's data, I cannot share it publicly. However, we can work on enabling this privately.
What is the expected behavior?
The expectancy is to not get the error for the hybrid search.
What is your host/environment?
MacOS Ventura 13.3.1, I'm running on Docker compose.
Do you have any additional context?
When I search on the exact same index for semantic search or lexical search, it works properly. It only happens for Hybrid search. I observe a pattern that queries with more than one word tend to be more likely to have this error than simple queries. Queries that failed are like "horror movies", "teen mom", "news radio". However, I observed that when I changed the combination technique, some queries started working, and other queries started failing. I also observed that when I changed the index data, some queries started working, and other queries started failing. However, for the same data and same settings, results are idempotent.
@tiagoshin can share the query which you are using?
I can see the exception is coming from this: https://github.com/opensearch-project/neural-search/blob/main/src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java#L64
@tiagoshin please share your query skeleton, so that it can better help us debug the issue here.
Thank you @navneet1v, I shared the query skeleton with David Fowler from AWS customer support, did you receive the query?
@tiagoshin Looking at logs which are shared, I can see that HybridQueryPhaseSearcher which is responsible for running the query is not invoked. This let me believe that either the hybrid query clause was not the top level clause, or there are some nested fields in the index which lead to wrapping of hybrid query clause with other query clauses(This is OpenSearch default behavior).
We are already working on a fix for nested query clauses, as part of this github issue: https://github.com/opensearch-project/neural-search/issues/466.
Hi @navneet1v, I see the HybridQueryPhaseSearcher invoked in the following line, isn't it?
2023-11-21 16:53:16 at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:66) ~[?:?]
@tiagoshin if you look at the code: https://github.com/opensearch-project/neural-search/blob/2.11/src/main/java/org/opensearch/neuralsearch/search/query/HybridQueryPhaseSearcher.java#L66
Line 66 will hit if the query is not the top level query is not hybrid query.
That makes sense, thank you @navneet1v!
We have pushed a code change that should fix this issue, please check details in this issue comment: https://github.com/opensearch-project/neural-search/issues/466#issuecomment-1854855662
I'm getting a similar, but different exception, on OS 2.11.1 (6b1986e964d440be9137eba1413015c31c5a7752):
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
at org.apache.lucene.search.DisiPriorityQueue.add(DisiPriorityQueue.java:100) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
at org.opensearch.neuralsearch.query.HybridQueryScorer.initializeSubScorersPQ(HybridQueryScorer.java:146) ~[?:?]
at org.opensearch.neuralsearch.query.HybridQueryScorer.<init>(HybridQueryScorer.java:47) ~[?:?]
at org.opensearch.neuralsearch.query.HybridQueryWeight.scorer(HybridQueryWeight.java:91) ~[?:?]
at org.apache.lucene.search.Weight.bulkScorer(Weight.java:166) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
at org.opensearch.search.internal.ContextIndexSearcher$1.bulkScorer(ContextIndexSearcher.java:374) ~[opensearch-2.11.1.jar:2.11.1]
at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:319) ~[opensearch-2.11.1.jar:2.11.1]
at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:281) ~[opensearch-2.11.1.jar:2.11.1]
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:551) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWithCollector(HybridQueryPhaseSearcher.java:104) ~[?:?]
at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:64) ~[?:?]
at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:280) ~[opensearch-2.11.1.jar:2.11.1]
at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:153) ~[opensearch-2.11.1.jar:2.11.1]
at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:533) ~[opensearch-2.11.1.jar:2.11.1]
at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:597) ~[opensearch-2.11.1.jar:2.11.1]
at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:566) ~[opensearch-2.11.1.jar:2.11.1]
at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-2.11.1.jar:2.11.1]
at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-2.11.1.jar:2.11.1]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.11.1.jar:2.11.1]
... 8 more
Full exception: aioobe.txt
Unfortunately I'm not familiar enough with the subject matter to know if this is the same exception or if it has been patched. I get this error more reproducibly on my single-node cluster with only 8800 documents and the following search pipeline and query:
{
"description": "Post processor for hybrid search",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "min_max"
},
"combination": {
"technique": "arithmetic_mean",
"parameters": {
"weights": [
0.6,
0.3,
0.1
]
}
}
}
}
]
}
Query:
{
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"_source": {
"exclude": [
"text_embedding"
]
},
"query": {
"hybrid": {
"queries": [
{
"match_phrase": {
"text": {
"query": "foo"
}
}
},
{
"match": {
"text": {
"query": "foo"
}
}
},
{
"neural": {
"text_embedding": {
"query_text": "foo",
"model_id": "--------",
"k": 5
}
}
}
]
}
}
}
I have narrowed down the issue to occurring when one or more of the sub-queries return effectively 0 results after normalizastion. That is - the scores are so low after normalization that they are completely discarded. If I remove two of the sub-queries and disable the search pipeline, the query works. Or if I make a more specific query where the sub-queries return a similar number of results, the query also works.
I'm happy to provide more information if needed, or make a new issue if it's not the same one as this/#466. I'm running in Docker, so not quite sure how to test the RC build from that thread.
Edit: also tried on 2.12.0, still happening. Is this new issue material?
Edit: also tried on 2.12.0, still happening. Is this new issue material?
@Lemmmy so what you are saying that you tried on the tar provided here in this comment: https://github.com/opensearch-project/neural-search/issues/466#issuecomment-1854855662 and it is still not working.
cc: @martin-gaievski
I'm running in Docker, so not quite sure how to test the RC build from that thread.
@Lemmmy the CIs of Opensearch publishes the builds everyday in Opensearch staging repo of Docker: https://hub.docker.com/r/opensearchstaging/opensearch/tags
You can use this: docker pull opensearchstaging/opensearch:2.12.0 to pull the 2.12.0 version of opensearch and see if the issue is still existing.
@Lemmmy I did some more deep-dive and I am able to reproduce the issue. I also tested with different queries where one query clause doesn't yield any result. That use case is working perfectly.
But I able to figure out the root cause of the exception you are getting. Here are the steps to reproduce:
Setup
PUT example-index
{
"settings": {
"index": {
"knn": true,
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"properties": {
"text": {
"type": "text"
},
"my_vector": {
"type": "knn_vector",
"dimension": 1,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "lucene"
}
},
"integer": {
"type": "integer"
}
}
}
}
PUT example-index/_bulk?refresh
{"index":{"_id":"1"}}
{"text": "neural","my_vector": [5], "integer": 1 }
{"index":{"_id":"2"}}
{"text": "neural neural","my_vector": [4], "integer": 2 }
{"index":{"_id":"3"}}
{"text": "neural neural neural","my_vector": [3], "integer": 3 }
{"index":{"_id":"4"}}
{"text": "neural neural neural neural", "integer": 4 }
{"index":{"_id":"5"}}
{"my_vector": [0], "integer": 5 }
PUT /_search/pipeline/nlp-search-pipeline
{
"description": "Post processor for hybrid search",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "min_max"
}
}
}
]
}
# Search Query
POST example-index/_search?search_pipeline=nlp-search-pipeline
{
"query": {
"hybrid": {
"queries": [
{
"term": {
"text": "neural"
}
},
{
"term": {
"text": "neural"
}
},
{
"knn": {
"my_vector": {
"vector": [
3
],
"k": 3
}
}
}
]
}
},
"size": 3
}
Output of Search
{
"error": {
"root_cause": [
{
"type": "array_index_out_of_bounds_exception",
"reason": "Index 2 out of bounds for length 2"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "example-index",
"node": "roL2TjVsTdex976hXKl9jg",
"reason": {
"type": "array_index_out_of_bounds_exception",
"reason": "Index 2 out of bounds for length 2"
}
}
],
"caused_by": {
"type": "array_index_out_of_bounds_exception",
"reason": "Index 2 out of bounds for length 2",
"caused_by": {
"type": "array_index_out_of_bounds_exception",
"reason": "Index 2 out of bounds for length 2"
}
}
},
"status": 500
}
Stacktrace
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
at org.apache.lucene.search.DisiPriorityQueue.add(DisiPriorityQueue.java:100) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
at org.opensearch.neuralsearch.query.HybridQueryScorer.initializeSubScorersPQ(HybridQueryScorer.java:146) ~[?:?]
at org.opensearch.neuralsearch.query.HybridQueryScorer.<init>(HybridQueryScorer.java:47) ~[?:?]
at org.opensearch.neuralsearch.query.HybridQueryWeight.scorer(HybridQueryWeight.java:91) ~[?:?]
at org.apache.lucene.search.Weight.bulkScorer(Weight.java:166) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
at org.opensearch.search.internal.ContextIndexSearcher$1.bulkScorer(ContextIndexSearcher.java:374) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:319) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:281) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:551) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWithCollector(HybridQueryPhaseSearcher.java:104) ~[?:?]
at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:64) ~[?:?]
at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:280) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:153) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:533) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:597) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:566) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.11.1-SNAPSHOT.jar:2.11.1-SNAPSHOT]
... 8 more
Root Cause
So, what happening here is if we look at the queries provided in the hybrid clause, I have deliberately put my 2 text search queries exactly same.
{
"term": {
"text": "neural"
}
}
We create a map of Query to the index(key being the query object) here and use that map here to create PQ and to assign the scorers created for each query. Because both the text queries are same, the map we are creating instead of having size 3(as we have 3 queries) it is getting created with size 2. Which is leading to the exception.
Now, in production I don't expect users to provide two exactly same queries. But this is a bug.
Please let me know if removing the duplicate queries solves your issue.
Proposed Solution
We should go ahead and throw out an exception with proper message to the user that the queries defined have duplicates in it. @Lemmmy Please let me know your thoughts on this.
cc: @martin-gaievski
@tiagoshin I some deep-dive here: https://github.com/opensearch-project/neural-search/issues/497#issuecomment-1862259133 can you check on your side for you also this was the issue? if not can you provide the query skeleton so that I can make sure that all bugs provided in this issue are resolved. I understand that your query contained nested fields which we have already fixed for 2.12. But is there any other issue that you are facing please do comment, so that it can be fixed in 2.12
Thanks for the quick investigation! To clarify, am I supposed to avoid combining match and neural with the same query? As in, this isn't okay? (from the docs):
"query": {
"hybrid": {
"queries": [
{
"match": {
"text": {
"query": "Hi world"
}
}
},
{
"neural": {
"passage_embedding": {
"query_text": "Hi world",
"model_id": "aVeif4oB5Vm0Tdw8zYO2",
"k": 5
}
}
}
]
}
}
Or is it just because of my use of both match_phrase and match?
When changing this line: https://github.com/opensearch-project/neural-search/blob/5daddfd2f10486ad4688742aa26db82461b6a73c/src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java#L140
To:
-DisiPriorityQueue subScorersPQ = new DisiPriorityQueue(queryToIndex.size());
+DisiPriorityQueue subScorersPQ = new DisiPriorityQueue(subScorers.size());
The query I provided in https://github.com/opensearch-project/neural-search/issues/497#issuecomment-1859324729 no longer errors and the results look roughly as I'd expect.
Thanks for the quick investigation! To clarify, am I supposed to avoid combining match and neural with the same query? As in, this isn't okay? (from the docs):
"query": { "hybrid": { "queries": [ { "match": { "text": { "query": "Hi world" } } }, { "neural": { "passage_embedding": { "query_text": "Hi world", "model_id": "aVeif4oB5Vm0Tdw8zYO2", "k": 5 } } } ] } } Or is it just because of my use of both match_phrase and match?
This is okay..
But in your case:
{
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"_source": {
"exclude": [
"text_embedding"
]
},
"query": {
"hybrid": {
"queries": [
{
"match_phrase": {
"text": {
"query": "foo"
}
}
},
{
"match": {
"text": {
"query": "foo"
}
}
},
{
"neural": {
"text_embedding": {
"query_text": "foo",
"model_id": "--------",
"k": 5
}
}
}
]
}
}
}
The match_phrase and match are actually boiling down to same queries and hence the issue was happening.
Ah, that makes a lot more sense, I will fix that then. Thanks for all your help.
Ah, that makes a lot more sense, I will fix that then. Thanks for all your help.
Sure, I am planning to add an exception signature if we found out queries are same and then throw the exception out from here: https://github.com/opensearch-project/neural-search/blob/main/src/main/java/org/opensearch/neuralsearch/query/HybridQueryBuilder.java#L297
like this:
if(queries.size() != new HashSet<>(queries).size()) {
throw new OpenSearchException("There are duplicates in the query.");
}
This will ensure that queries are not run, because if we do this change
-DisiPriorityQueue subScorersPQ = new DisiPriorityQueue(queryToIndex.size());
+DisiPriorityQueue subScorersPQ = new DisiPriorityQueue(subScorers.size());
it has some other side effects in the code.
Hi @navneet1v, thank you very much for your attention. I'm testing the build for 2.12.0 from RC build and now I'm getting distinct errors. For all queries, when I perform hybrid search, I got:
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "totalHitsThreshold must be less than max integer value"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "test",
"node": "si1uOQWhRMWsWbFC6kaKjg",
"reason": {
"type": "illegal_argument_exception",
"reason": "totalHitsThreshold must be less than max integer value"
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "totalHitsThreshold must be less than max integer value",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "totalHitsThreshold must be less than max integer value"
}
}
},
"status": 400
}
So I increased track_total_hits to 50,000 and it worked for some queries. For other queries, I got the following error:
"error": {
"root_cause": [],
"type": "search_phase_execution_exception",
"reason": "The phase has failed",
"phase": "query",
"grouped": true,
"failed_shards": [],
"caused_by": {
"type": "illegal_state_exception",
"reason": "Score normalization processor cannot produce final query result"
}
},
"status": 500
}
Here are the logs:
2023-12-19 18:25:36 opensearch_semantic1 | org.opensearch.action.search.SearchPhaseExecutionException: The phase has failed
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:718) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.AbstractSearchAsyncAction.successfulShardExecution(AbstractSearchAsyncAction.java:622) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.AbstractSearchAsyncAction.onShardResultConsumed(AbstractSearchAsyncAction.java:607) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$onShardResult$9(AbstractSearchAsyncAction.java:590) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.QueryPhaseResultConsumer$PendingMerges.consume(QueryPhaseResultConsumer.java:373) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.QueryPhaseResultConsumer.consumeResult(QueryPhaseResultConsumer.java:132) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.AbstractSearchAsyncAction.onShardResult(AbstractSearchAsyncAction.java:590) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.SearchQueryThenFetchAsyncAction.onShardResult(SearchQueryThenFetchAsyncAction.java:161) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:292) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:59) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:44) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:99) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:52) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:70) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:746) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.transport.TransportService$9.handleResponse(TransportService.java:1693) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1475) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1558) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1538) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:72) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:62) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:45) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:911) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
2023-12-19 18:25:36 opensearch_semantic1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
2023-12-19 18:25:36 opensearch_semantic1 | at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
2023-12-19 18:25:36 opensearch_semantic1 | Caused by: org.opensearch.search.pipeline.SearchPipelineProcessingException: java.lang.IllegalStateException: Score normalization processor cannot produce final query result
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.search.pipeline.Pipeline.runSearchPhaseResultsTransformer(Pipeline.java:295) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.search.pipeline.PipelinedRequest.transformSearchPhaseResults(PipelinedRequest.java:47) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:755) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.AbstractSearchAsyncAction.successfulShardExecution(AbstractSearchAsyncAction.java:620) [opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | ... 31 more
2023-12-19 18:25:36 opensearch_semantic1 | Caused by: java.lang.IllegalStateException: Score normalization processor cannot produce final query result
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.neuralsearch.processor.NormalizationProcessorWorkflow.getSearchHits(NormalizationProcessorWorkflow.java:177) ~[?:?]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.neuralsearch.processor.NormalizationProcessorWorkflow.updateOriginalFetchResults(NormalizationProcessorWorkflow.java:142) ~[?:?]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.neuralsearch.processor.NormalizationProcessorWorkflow.execute(NormalizationProcessorWorkflow.java:73) ~[?:?]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.neuralsearch.processor.NormalizationProcessor.process(NormalizationProcessor.java:62) ~[?:?]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.search.pipeline.SearchPhaseResultsProcessor.process(SearchPhaseResultsProcessor.java:48) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.search.pipeline.Pipeline.runSearchPhaseResultsTransformer(Pipeline.java:276) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.search.pipeline.PipelinedRequest.transformSearchPhaseResults(PipelinedRequest.java:47) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:755) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | at org.opensearch.action.search.AbstractSearchAsyncAction.successfulShardExecution(AbstractSearchAsyncAction.java:620) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-19 18:25:36 opensearch_semantic1 | ... 31 more
@tiagoshin can you share the query skeleton with me so that I can reproduce the issue. BTW are you setting track_total_hits in the query?
@navneet1v I shared the query and artifacts with David Fowler. Could you please get them with him?
@navneet1v I got the same issue that I reported before about the IndexOutOfBoundsException on version 2.12.0 when increasing the ef_construction parameter to 1024. Before that, the exact same query with the same data and model was working for a particular query. Once I increased the ef_construction parameter, I got the following error:
{
"error": {
"root_cause": [
{
"type": "index_out_of_bounds_exception",
"reason": null
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "pluto-test2",
"node": "j4rUlY77ToenCAVXWKUnxA",
"reason": {
"type": "index_out_of_bounds_exception",
"reason": null
}
}
],
"caused_by": {
"type": "index_out_of_bounds_exception",
"reason": null,
"caused_by": {
"type": "index_out_of_bounds_exception",
"reason": null
}
}
},
"status": 500
}
On the logs I see:
2023-12-21 14:39:34 org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
2023-12-21 14:39:34 at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:718) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:379) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:757) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:511) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:301) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:755) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.transport.TransportService$9.handleException(TransportService.java:1699) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1485) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1599) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1573) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:81) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.transport.TransportChannel.sendErrorResponse(TransportChannel.java:73) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:70) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:104) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:911) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
2023-12-21 14:39:34 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
2023-12-21 14:39:34 at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
2023-12-21 14:39:34 Caused by: org.opensearch.OpenSearchException$3
2023-12-21 14:39:34 at org.opensearch.OpenSearchException.guessRootCauses(OpenSearchException.java:708) ~[opensearch-core-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:377) [opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 ... 23 more
2023-12-21 14:39:34 Caused by: java.lang.IndexOutOfBoundsException
2023-12-21 14:39:34 at java.base/java.nio.Buffer.checkIndex(Buffer.java:687) ~[?:?]
2023-12-21 14:39:34 at java.base/java.nio.DirectByteBuffer.get(DirectByteBuffer.java:269) ~[?:?]
2023-12-21 14:39:34 at org.apache.lucene.store.ByteBufferGuard.getByte(ByteBufferGuard.java:115) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.readByte(ByteBufferIndexInput.java:564) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$8.longValue(Lucene90NormsProducer.java:443) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.apache.lucene.search.LeafSimScorer.getNormValue(LeafSimScorer.java:47) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.apache.lucene.search.LeafSimScorer.score(LeafSimScorer.java:60) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.apache.lucene.search.TermScorer.score(TermScorer.java:86) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.apache.lucene.search.DisjunctionSumScorer.score(DisjunctionSumScorer.java:41) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:61) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:61) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.apache.lucene.search.ReqOptSumScorer.score(ReqOptSumScorer.java:266) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.opensearch.neuralsearch.query.HybridQueryScorer.hybridScores(HybridQueryScorer.java:117) ~[?:?]
2023-12-21 14:39:34 at org.opensearch.neuralsearch.search.HybridTopScoreDocCollector$1.collect(HybridTopScoreDocCollector.java:65) ~[?:?]
2023-12-21 14:39:34 at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:277) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:236) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.opensearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:71) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:326) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:282) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:549) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
2023-12-21 14:39:34 at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWithCollector(HybridQueryPhaseSearcher.java:219) ~[?:?]
2023-12-21 14:39:34 at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:72) ~[?:?]
2023-12-21 14:39:34 at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:282) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:155) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:547) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:611) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:580) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.12.0.jar:2.12.0]
2023-12-21 14:39:34 ... 8 more
However, if I decrease the ef_construction, the queries that were getting the error reported here, kept having the same error. So decreasing ef_construction doesn't solve other issues, but increasing it may cause this error.
@navneet1v I got the same issue that I reported before about the IndexOutOfBoundsException on version 2.12.0 when increasing the
ef_constructionparameter to 1024. Before that, the exact same query with the same data and model was working for a particular query. Once I increased theef_constructionparameter, I got the following error:Block (35 lines)
{ "error": { "root_cause": [ { "type": "index_out_of_bounds_exception", "reason": null } ], "type": "search_phase_execution_exception", "reason": "all shards failed", "phase": "query", "grouped": true, "failed_shards": [ { "shard": 0, "index": "pluto-test2", "node": "j4rUlY77ToenCAVXWKUnxA", "reason": { "type": "index_out_of_bounds_exception", "reason": null } } ], "caused_by": { "type": "index_out_of_bounds_exception", "reason": null, "caused_by": { "type": "index_out_of_bounds_exception", "reason": null } } }, "status": 500 }On the logs I see:
Block (69 lines)
2023-12-21 14:39:34 org.opensearch.action.search.SearchPhaseExecutionException: all shards failed 2023-12-21 14:39:34 at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:718) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:379) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:757) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:511) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:301) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:755) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.transport.TransportService$9.handleException(TransportService.java:1699) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1485) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1599) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1573) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:81) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.transport.TransportChannel.sendErrorResponse(TransportChannel.java:73) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:70) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:104) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:911) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] 2023-12-21 14:39:34 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] 2023-12-21 14:39:34 at java.base/java.lang.Thread.run(Thread.java:829) [?:?] 2023-12-21 14:39:34 Caused by: org.opensearch.OpenSearchException$3 2023-12-21 14:39:34 at org.opensearch.OpenSearchException.guessRootCauses(OpenSearchException.java:708) ~[opensearch-core-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:377) [opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 ... 23 more 2023-12-21 14:39:34 Caused by: java.lang.IndexOutOfBoundsException 2023-12-21 14:39:34 at java.base/java.nio.Buffer.checkIndex(Buffer.java:687) ~[?:?] 2023-12-21 14:39:34 at java.base/java.nio.DirectByteBuffer.get(DirectByteBuffer.java:269) ~[?:?] 2023-12-21 14:39:34 at org.apache.lucene.store.ByteBufferGuard.getByte(ByteBufferGuard.java:115) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.readByte(ByteBufferIndexInput.java:564) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$8.longValue(Lucene90NormsProducer.java:443) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.apache.lucene.search.LeafSimScorer.getNormValue(LeafSimScorer.java:47) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.apache.lucene.search.LeafSimScorer.score(LeafSimScorer.java:60) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.apache.lucene.search.TermScorer.score(TermScorer.java:86) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.apache.lucene.search.DisjunctionSumScorer.score(DisjunctionSumScorer.java:41) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:61) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:61) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.apache.lucene.search.ReqOptSumScorer.score(ReqOptSumScorer.java:266) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.opensearch.neuralsearch.query.HybridQueryScorer.hybridScores(HybridQueryScorer.java:117) ~[?:?] 2023-12-21 14:39:34 at org.opensearch.neuralsearch.search.HybridTopScoreDocCollector$1.collect(HybridTopScoreDocCollector.java:65) ~[?:?] 2023-12-21 14:39:34 at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:277) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:236) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.opensearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:71) ~[opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:326) ~[opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:282) ~[opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:549) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47] 2023-12-21 14:39:34 at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWithCollector(HybridQueryPhaseSearcher.java:219) ~[?:?] 2023-12-21 14:39:34 at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:72) ~[?:?] 2023-12-21 14:39:34 at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:282) ~[opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:155) ~[opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:547) ~[opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:611) ~[opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:580) ~[opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.12.0.jar:2.12.0] 2023-12-21 14:39:34 ... 8 moreHowever, if I decrease the
ef_construction, the queries that were getting the error reported here, kept having the same error. So decreasingef_constructiondoesn't solve other issues, but increasing it may cause this error.
The IndexOutOfBoundsException exception fix is not there in 2.12, the 2.12 contains only the fix for nestedQueries. If you look at my RCA done here: https://github.com/opensearch-project/neural-search/issues/497#issuecomment-1862259133 it provides the info that if you have 2 queries which are same then in that case the issue will happen. So, check your array of hybrid queries and see if there are duplicates. If yes remove them and this can be a short fix from your side. Meanwhile we deicide how to handle the duplicate queries.
@navneet1v I saw your comment about having 2 queries that are the same, but it's not the case. I have only 2 queries, one for neural search and the other for lexical search. One of them uses a model, and the other uses a bunch of matching rules.
Also, I'm reporting here that I got the IndexOutOfBoundsException just by changing the ef_construction parameter when recreating the index. I didn't change the query, so the same query that was working before failed with a distinct ef_construction parameter. That's why I don't think duplicate queries are the issue for me.
@navneet1v I saw your comment about having 2 queries that are the same, but it's not the case. I have only 2 queries, one for neural search and the other for lexical search. One of them uses a model, and the other uses a bunch of matching rules. Also, I'm reporting here that I got the IndexOutOfBoundsException just by changing the
ef_constructionparameter when recreating the index. I didn't change the query, so the same query that was working before failed with a distinctef_constructionparameter. That's why I don't think duplicate queries are the issue for me.
Thanks will check this new error trace. BTW this is pretty counter intuitive that ef_construction is creating a problem that too while doing the query. This is a parameter that is used during index build time.
I am thinking this could be happening due to high value of ef_construction only if the overall memory of the system is under stress. To better understand this can you share few more details like:
- JVM heap size of Opensearch
- Which KNN engine you are using: nmslib, faiss, lucene
- Machine/Docker RAM where you are running the Opensearch process.
@tiagoshin
@navneet1v you're right, this issue in particular is caused by system memory constraints. I increased the JVM heap size and it worked, thank you! However it worth noting that the other issue keeps happening
@navneet1v you're right, this issue in particular is caused by system memory constraints. I increased the JVM heap size and it worked, thank you! However it worth noting that the other issue keeps happening
Thanks for the response. I am working on that issue. Doing some more validations before I put a Root cause and the fix for the issue.
So I was able to get to the rootcause of the issue mentioned here(https://github.com/opensearch-project/neural-search/issues/497#issuecomment-1863499052):
Hi @navneet1v, thank you very much for your attention. I'm testing the build for 2.12.0 from RC build and now I'm getting distinct errors. For all queries, when I perform hybrid search, I got:
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "totalHitsThreshold must be less than max integer value"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "test",
"node": "si1uOQWhRMWsWbFC6kaKjg",
"reason": {
"type": "illegal_argument_exception",
"reason": "totalHitsThreshold must be less than max integer value"
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "totalHitsThreshold must be less than max integer value",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "totalHitsThreshold must be less than max integer value"
}
}
},
"status": 400
}
So the first issue where we are seeing totalHitsThreshold must be less than max integer value is coming from this line: https://github.com/opensearch-project/neural-search/blob/63fe67f9e4e8c1a8ecf65b83db0509056120cecd/src/main/java/org/opensearch/neuralsearch/search/HitsThresholdChecker.java#L27-L29
This case happen when we are adding track_total_hits: true in search request, rather than any integer value. I think we can remove the check. When track_total_hits: true the value of total hits become Integer.MAX_VALUE and hence the check fails. I check with other query clauses track_total_hits: true works. I will go ahead and fix this.
For the second issue where was track_total_hits: 50000, @tiagoshin can you provide me this info:
- How many shards you were using?
- How many data nodes we are using?
- How many total documents were there in the index?
@navneet1v
- I'm using two shards:
index shard prirep state docs store ip node
.plugins-ml-model-group 0 p STARTED 1 12.5kb 172.18.0.3 node-1
.plugins-ml-model-group 0 r STARTED 1 5.5kb 172.18.0.2 node-2
.plugins-ml-config 0 p STARTED 1 3.9kb 172.18.0.3 node-1
.plugins-ml-config 0 r STARTED 1 3.9kb 172.18.0.2 node-2
.plugins-ml-model 0 p STARTED 11 115.8mb 172.18.0.3 node-1
.plugins-ml-model 0 r STARTED 11 115.9mb 172.18.0.2 node-2
.plugins-ml-task 0 p STARTED 2 44.4kb 172.18.0.3 node-1
.plugins-ml-task 0 r STARTED 2 36.8kb 172.18.0.2 node-2
test 0 p STARTED 75997 190mb 172.18.0.3 node-1
test 0 r STARTED 75997 191.9mb 172.18.0.2 node-2
- I'm using 2 data nodes
- I have 75997 documents:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .plugins-ml-model-group 8radM4MFTvSD0ml76PrneA 1 1 1 0 18kb 12.5kb
green open .plugins-ml-config 4KiNinC1QTCmEwmOwK-omw 1 1 1 0 7.8kb 3.9kb
green open .plugins-ml-model RYKMfd3KTj2OfiK4madWyw 1 1 11 0 231.8mb 115.8mb
green open .plugins-ml-task E_rRWs4vSuulZ6U6n2FY9g 1 1 2 0 81.2kb 44.4kb
green open test GSRltbzPQVeJ5h7MoxYSdg 1 1 75997 21976 382mb 190mb
@navneet1v
- I'm using two shards:
Block (12 lines)
index shard prirep state docs store ip node .plugins-ml-model-group 0 p STARTED 1 12.5kb 172.18.0.3 node-1 .plugins-ml-model-group 0 r STARTED 1 5.5kb 172.18.0.2 node-2 .plugins-ml-config 0 p STARTED 1 3.9kb 172.18.0.3 node-1 .plugins-ml-config 0 r STARTED 1 3.9kb 172.18.0.2 node-2 .plugins-ml-model 0 p STARTED 11 115.8mb 172.18.0.3 node-1 .plugins-ml-model 0 r STARTED 11 115.9mb 172.18.0.2 node-2 .plugins-ml-task 0 p STARTED 2 44.4kb 172.18.0.3 node-1 .plugins-ml-task 0 r STARTED 2 36.8kb 172.18.0.2 node-2 test 0 p STARTED 75997 190mb 172.18.0.3 node-1 test 0 r STARTED 75997 191.9mb 172.18.0.2 node-2
- I'm using 2 data nodes
- I have 75997 documents:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open .plugins-ml-model-group 8radM4MFTvSD0ml76PrneA 1 1 1 0 18kb 12.5kb green open .plugins-ml-config 4KiNinC1QTCmEwmOwK-omw 1 1 1 0 7.8kb 3.9kb green open .plugins-ml-model RYKMfd3KTj2OfiK4madWyw 1 1 11 0 231.8mb 115.8mb green open .plugins-ml-task E_rRWs4vSuulZ6U6n2FY9g 1 1 2 0 81.2kb 44.4kb green open test GSRltbzPQVeJ5h7MoxYSdg 1 1 75997 21976 382mb 190mb
Actually you are using 1 shard. the other shard is a replica of the first shard. But thanks for this information. The code path which is resulting in this issue that you are getting when you set track_total_hits: 50000 can only come if primary shards are 1.
Just for resolving the issue for now, can you try with more than 1 primary shards. and see if you still face the issue when track_total_hits: 50000. I am hoping you won't
@navneet1v It worked when increasing shards to 2, thank you very much! What's your advice about this if I increase the number of replicas as well?