BUG: KNearestNeighbors Not Working
Using the .NET example code it appears that the KNearestNeighbors is not returning more than a single result.
Example Code:
internal static async Task SingleVectorSearch(SearchClient searchClient, OpenAIClient openAIClient, string query, int k = 3, int nearestNeighbors = 3)
{
// Generate the embedding for the query
var queryEmbeddings = await SemanticFunctions.GenerateEmbeddings(query, openAIClient);
// Perform the vector similarity search
var searchOptions = new SearchOptions
{
Vectors = { new() { Value = queryEmbeddings.ToArray(), KNearestNeighborsCount = nearestNeighbors, Fields = { "contentVector" } } },
Size = k,
Select = { "id", "title", "content", "category", "url" },
};
SearchResults<SearchDocument> response = await searchClient.SearchAsync<SearchDocument>(null, searchOptions);
int count = 0;
await foreach (SearchResult<SearchDocument> result in response.GetResultsAsync())
{
count++;
// for (int i = 0; i < nearestNeighbors; i++)
// {
Console.WriteLine($"Id: {result.Document["id"]}");
Console.WriteLine($"Title: {result.Document["title"]}");
Console.WriteLine($"Score: {result.Score}\n");
Console.WriteLine($"Content: {result.Document["content"]}");
Console.WriteLine($"Category: {result.Document["category"]}\n\n");
// }
}
Console.WriteLine($"Total Results: {count}");
}
The index# I am using is a bit non-typical and I wonder if that may be the cause. Here is an example of a result:
I have confirmed that that is indeed a single chunk of text in this particular record. I would expect the KNearestNeighbors to add n+ record "content" field values before and after into a single string return for this record. Perhaps that is not the intent or the search options are not setup correctly. Please advise, thank you!
In addition I am happy to roll my own neigherest neighbor content field concatication, but I also could not figure out how to search just on the Id of a record as well.
Any update on this? Has this been confirmed a bug?
Hi @Mano1192, how many total documents (post-chunked) do you have in your index? If you aren't invoking any filtering, which I don't see in your code above, you should retrieve the value of your KNearestNeighborsCount retrieved in your index. Additionally your 'Size' is set to equal your k so in this case you should be getting back 3 results from your index.
Im not sure were on the same page. NearestNeighbors != size in your SDK example, here is an exerpt:
// Perform the vector similarity search var searchOptions = new SearchOptions { Vectors = { new() { Value = queryEmbeddings.ToArray(), KNearestNeighborsCount = 3, Fields = { "contentVector" } } }, Size = 10, QueryType = SearchQueryType.Semantic, QueryLanguage = QueryLanguage.EnUs, SemanticConfigurationName = SemanticSearchConfigName, QueryCaption = QueryCaptionType.Extractive, QueryAnswer = QueryAnswerType.Extractive, QueryCaptionHighlightEnabled = true, Select = { "title", "content", "category" }, };
Nearest neighbors should take each result, and grab the index neighbors of that result. EG> if my result id == 5, then a NN = 2 should retrieve ids: 3, 4, 5, 6, 7 or 2 above and 2 below the returned index. The "Size" parameter is then how many search matches it hits on. With text chunking, a single document could be 100 chunks, and the top related vector may only be a sentence or two depending on chunking techniques. Nearest neighbors parameter in other vector db's like pinecone or weaviate return the surrounding indexes in order to provide a rag pattern more cohesive data to formulate its response with.
Does this make sense or if the NN in ACS not work in this manner as I described?