neural-search icon indicating copy to clipboard operation
neural-search copied to clipboard

[FEATURE] Sentence Highlighter

Open asfoorial opened this issue 2 years ago • 3 comments

Is your feature request related to a problem?

No

What solution would you like?

I would like to have a highlighter that supports the neural search capability. It should highlight the most relevant sentences in the neural search resulting documents.

What alternatives have you considered?

There are no available alternatives at the moment. So the only choice is to develop one.

Do you have any additional context?

I tried to implement it myself but faced the following challenges:

  1. I had to implement my own neural-search plugin since this one relies on KNNQuery which does store the query text. For example, in the below, fieldContext.context.query() returns an instance of KNNQuery. I suggest that the neural-search plugin has its own NeuralQuery that extends KNNQuery and keeps neural search related attributes such as query text. I hope there are other ways to get the query text at highlight time.

@Override public HighlightField highlight(FieldHighlightContext fieldContext) { System.out.println("Query: "+fieldContext.context.query()); }

  1. The inferenceSentences method is asynchronous notifies an ActionListener after the result is retrieved. If I call it inside the above highlight method then the highlight method will return before the actionlistener is notified and thus won't be able to get the embeddings to compute sentence similarity and get the sentence to highlight. I had to implement my own synchronous inferSentences. Below is a pseudo code of what I am trying to do.

@Override public HighlightField highlight(FieldHighlightContext fieldContext) { System.out.println("highlighting.."); List<Text> responses = new ArrayList<>(); String queryText = get query text from fieldContext.context.query()

    List<Float[]> embeddings = new ArrayList<>();

    List<String> sentences= get sentences from search hit
    sentences = query + sentences


    
    List<List<Float>> vectors = clientAccessor.inferSentences("U3R9CYcBOk2JRjrls0nH", sentences);

    for(List<Float> v:vectors)
        {
            List<Float> s = v;
            embeddings.add(s.stream().toArray(Float[]::new));
        }
        System.out.println("Computing similarity");
        double maxSim = 0;
        String maxSentence = null;
        if(embeddings.size()>0)
        {
            Float[] queryEmbedding = embeddings.get(0);
            for(int i=1;i<embeddings.size();i++)
            {
                float sim = consineSim(queryEmbedding, embeddings.get(i));
                set maxSim and maxSentence
            }
        }
    responses.add(maxSentence);

    return new HighlightField(fieldContext.fieldName, responses.toArray(new Text[] {}));
}

Having said the above, I hope that you tell what is the route to take here. Is this feature going to be available in the plugin any time soon?

Thanks

asfoorial avatar Mar 23 '23 02:03 asfoorial

@asfoorial This is an interesting feature, and I remember the same request for the highlight feature at the time the RFC was created for this plugin.

Is this feature going to be available in the plugin any time soon?

Highlight feature was not in our roadmap, as team was busy in making plugin GA, but we would really like this feature to be present in plugin.

@asfoorial on the approaches suggested I need to take a deep-look to see if that is feasible or not. In meantime can you provide the use case which you are trying to solve with Highlight feature.

navneet1v avatar Mar 23 '23 19:03 navneet1v

Please +1 if you are looking for this feature to help prioritize

vamshin avatar Mar 28 '23 00:03 vamshin

+1 for highlight over neural searches and hybrid searches.

I think this could be helpful when building RAG-based workflows when you're trying to export portions of larger documents to extract just the portion of the text that's being matched.

dswitzer avatar Dec 13 '23 20:12 dswitzer

[Catch All Triage - 1, 2, 3, 4]

dblock avatar Jan 06 '25 17:01 dblock

Created corresponding issue for hybrid query. https://github.com/opensearch-project/neural-search/issues/1215

vibrantvarun avatar Mar 06 '25 22:03 vibrantvarun

Created corresponding issue for hybrid query. #1215

@vibrantvarun To clarify, the hybrid query discussed in #1215 cannot utilize OpenSearch's existing highlight capabilities (as documented here). This is distinct from issue #145, which requests a new semantic highlighting feature that I am currently implementing. cc: @heemin32

junqiu-lei avatar Mar 07 '25 17:03 junqiu-lei

@junqiu-lei yes I am aware of it.

vibrantvarun avatar Mar 07 '25 18:03 vibrantvarun

Resolving this issue as semantic highlighter feature is releasing at OpenSearch 3.0.0

junqiu-lei avatar Apr 21 '25 23:04 junqiu-lei