neural-search
neural-search copied to clipboard
[FEATURE] Sentence Highlighter
Is your feature request related to a problem?
No
What solution would you like?
I would like to have a highlighter that supports the neural search capability. It should highlight the most relevant sentences in the neural search resulting documents.
What alternatives have you considered?
There are no available alternatives at the moment. So the only choice is to develop one.
Do you have any additional context?
I tried to implement it myself but faced the following challenges:
- I had to implement my own neural-search plugin since this one relies on KNNQuery which does store the query text. For example, in the below, fieldContext.context.query() returns an instance of KNNQuery. I suggest that the neural-search plugin has its own NeuralQuery that extends KNNQuery and keeps neural search related attributes such as query text. I hope there are other ways to get the query text at highlight time.
@Override public HighlightField highlight(FieldHighlightContext fieldContext) { System.out.println("Query: "+fieldContext.context.query()); }
- The inferenceSentences method is asynchronous notifies an ActionListener after the result is retrieved. If I call it inside the above highlight method then the highlight method will return before the actionlistener is notified and thus won't be able to get the embeddings to compute sentence similarity and get the sentence to highlight. I had to implement my own synchronous inferSentences. Below is a pseudo code of what I am trying to do.
@Override public HighlightField highlight(FieldHighlightContext fieldContext) { System.out.println("highlighting.."); List<Text> responses = new ArrayList<>(); String queryText = get query text from fieldContext.context.query()
List<Float[]> embeddings = new ArrayList<>();
List<String> sentences= get sentences from search hit
sentences = query + sentences
List<List<Float>> vectors = clientAccessor.inferSentences("U3R9CYcBOk2JRjrls0nH", sentences);
for(List<Float> v:vectors)
{
List<Float> s = v;
embeddings.add(s.stream().toArray(Float[]::new));
}
System.out.println("Computing similarity");
double maxSim = 0;
String maxSentence = null;
if(embeddings.size()>0)
{
Float[] queryEmbedding = embeddings.get(0);
for(int i=1;i<embeddings.size();i++)
{
float sim = consineSim(queryEmbedding, embeddings.get(i));
set maxSim and maxSentence
}
}
responses.add(maxSentence);
return new HighlightField(fieldContext.fieldName, responses.toArray(new Text[] {}));
}
Having said the above, I hope that you tell what is the route to take here. Is this feature going to be available in the plugin any time soon?
Thanks
@asfoorial This is an interesting feature, and I remember the same request for the highlight feature at the time the RFC was created for this plugin.
Is this feature going to be available in the plugin any time soon?
Highlight feature was not in our roadmap, as team was busy in making plugin GA, but we would really like this feature to be present in plugin.
@asfoorial on the approaches suggested I need to take a deep-look to see if that is feasible or not. In meantime can you provide the use case which you are trying to solve with Highlight feature.
Please +1 if you are looking for this feature to help prioritize
+1 for highlight over neural searches and hybrid searches.
I think this could be helpful when building RAG-based workflows when you're trying to export portions of larger documents to extract just the portion of the text that's being matched.
[Catch All Triage - 1, 2, 3, 4]
Created corresponding issue for hybrid query. https://github.com/opensearch-project/neural-search/issues/1215
Created corresponding issue for hybrid query. #1215
@vibrantvarun To clarify, the hybrid query discussed in #1215 cannot utilize OpenSearch's existing highlight capabilities (as documented here). This is distinct from issue #145, which requests a new semantic highlighting feature that I am currently implementing. cc: @heemin32
@junqiu-lei yes I am aware of it.
Resolving this issue as semantic highlighter feature is releasing at OpenSearch 3.0.0