llama_index
llama_index copied to clipboard
query for relevant text snippets
Hi,
Is it possible to just get a list of closest nodes to the prompt? Like search for "green car" and just see a list of the closest matching text fragments
I see there's a function called "get_nodes_and_similarities_for_response" but not sure how it can be called when using GPTSimpleVectorIndex
Was just looking into this! I wonder whether similarity_search() from LangChain can be used here 🤔
It's useful to see the source text sometimes. Will be convenient to have this easily accessible in gpt_index
And for reference, with LangChain one way to do it is:
docsearch = FAISS.from_texts(texts, embeddings)
query = docsearch.similarity_search(<similar to text here>)
for item in query:
print(item)
You mean only get the top k results and not the answer? Here is how I do it:
def query_index(question, index_path, top_k):
# Load index
index = GPTSimpleVectorIndex.load_from_disk(index_path)
# Query index
response = index.query(question, response_mode="no_text", similarity_top_k=top_k, verbose=True)
# Format source text
relevant_source_text = []
for node in response.source_nodes:
similarity = node.similarity
text = node.source_text
relevant_source_text.append({similarity, text})
return relevant_source_text
So if you set the response_mode to "no_text" it will not use completions to answer it and you will be able to just access the top k results via response.source_nodes.
That works, thank you!
You mean only get the top k results and not the answer? Here is how I do it:
def query_index(question, index_path, top_k): # Load index index = GPTSimpleVectorIndex.load_from_disk(index_path) # Query index response = index.query(question, response_mode="no_text", similarity_top_k=top_k, verbose=True) # Format source text relevant_source_text = [] for node in response.source_nodes: similarity = node.similarity text = node.source_text relevant_source_text.append({similarity, text}) return relevant_source_textSo if you set the
response_modeto"no_text"it will not use completions to answer it and you will be able to just access the top k results viaresponse.source_nodes.
Thanks! 😃