llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

query for relevant text snippets

Open firasd opened this issue 2 years ago • 2 comments

Hi,

Is it possible to just get a list of closest nodes to the prompt? Like search for "green car" and just see a list of the closest matching text fragments

I see there's a function called "get_nodes_and_similarities_for_response" but not sure how it can be called when using GPTSimpleVectorIndex

firasd avatar Feb 04 '23 13:02 firasd

Was just looking into this! I wonder whether similarity_search() from LangChain can be used here 🤔

It's useful to see the source text sometimes. Will be convenient to have this easily accessible in gpt_index

And for reference, with LangChain one way to do it is:

docsearch = FAISS.from_texts(texts, embeddings)
query = docsearch.similarity_search(<similar to text here>)
for item in query:
    print(item)

batmanscode avatar Feb 04 '23 15:02 batmanscode

You mean only get the top k results and not the answer? Here is how I do it:

def query_index(question, index_path, top_k):
    # Load index
    index = GPTSimpleVectorIndex.load_from_disk(index_path)

    # Query index
    response = index.query(question, response_mode="no_text", similarity_top_k=top_k, verbose=True)
    
    # Format source text
    relevant_source_text = []
    for node in response.source_nodes:
        similarity = node.similarity
        text = node.source_text
        relevant_source_text.append({similarity, text})

    return relevant_source_text

So if you set the response_mode to "no_text" it will not use completions to answer it and you will be able to just access the top k results via response.source_nodes.

ymansurozer avatar Feb 05 '23 12:02 ymansurozer

That works, thank you!

firasd avatar Feb 05 '23 19:02 firasd

You mean only get the top k results and not the answer? Here is how I do it:

def query_index(question, index_path, top_k):
    # Load index
    index = GPTSimpleVectorIndex.load_from_disk(index_path)

    # Query index
    response = index.query(question, response_mode="no_text", similarity_top_k=top_k, verbose=True)
    
    # Format source text
    relevant_source_text = []
    for node in response.source_nodes:
        similarity = node.similarity
        text = node.source_text
        relevant_source_text.append({similarity, text})

    return relevant_source_text

So if you set the response_mode to "no_text" it will not use completions to answer it and you will be able to just access the top k results via response.source_nodes.

Thanks! 😃

batmanscode avatar Feb 05 '23 19:02 batmanscode