azure-search-vector-samples icon indicating copy to clipboard operation
azure-search-vector-samples copied to clipboard

BUG: KNearestNeighbors Not Working

Open Mano1192 opened this issue 2 years ago • 4 comments

Using the .NET example code it appears that the KNearestNeighbors is not returning more than a single result.

Example Code:

 internal static async Task SingleVectorSearch(SearchClient searchClient, OpenAIClient openAIClient, string query, int k = 3, int nearestNeighbors = 3)
        {
            // Generate the embedding for the query  
            var queryEmbeddings = await SemanticFunctions.GenerateEmbeddings(query, openAIClient);

            // Perform the vector similarity search  
            var searchOptions = new SearchOptions
            {
                Vectors = { new() { Value = queryEmbeddings.ToArray(), KNearestNeighborsCount = nearestNeighbors, Fields = { "contentVector" } } },
                Size = k,
                Select = { "id", "title", "content", "category", "url" },
            };

            SearchResults<SearchDocument> response = await searchClient.SearchAsync<SearchDocument>(null, searchOptions);

            int count = 0;
            await foreach (SearchResult<SearchDocument> result in response.GetResultsAsync())
            {
                count++;
                // for (int i = 0; i < nearestNeighbors; i++)
                // {
                    Console.WriteLine($"Id: {result.Document["id"]}");
                    Console.WriteLine($"Title: {result.Document["title"]}");
                    Console.WriteLine($"Score: {result.Score}\n");
                    Console.WriteLine($"Content: {result.Document["content"]}");
                    Console.WriteLine($"Category: {result.Document["category"]}\n\n");
                // }
                
            }
            Console.WriteLine($"Total Results: {count}");
        }

The index# I am using is a bit non-typical and I wonder if that may be the cause. Here is an example of a result: image I have confirmed that that is indeed a single chunk of text in this particular record. I would expect the KNearestNeighbors to add n+ record "content" field values before and after into a single string return for this record. Perhaps that is not the intent or the search options are not setup correctly. Please advise, thank you!

Mano1192 avatar Aug 24 '23 05:08 Mano1192

In addition I am happy to roll my own neigherest neighbor content field concatication, but I also could not figure out how to search just on the Id of a record as well.

Mano1192 avatar Aug 24 '23 05:08 Mano1192

Any update on this? Has this been confirmed a bug?

Mano1192 avatar Aug 30 '23 16:08 Mano1192

Hi @Mano1192, how many total documents (post-chunked) do you have in your index? If you aren't invoking any filtering, which I don't see in your code above, you should retrieve the value of your KNearestNeighborsCount retrieved in your index. Additionally your 'Size' is set to equal your k so in this case you should be getting back 3 results from your index.

farzad528 avatar Sep 01 '23 10:09 farzad528

Im not sure were on the same page. NearestNeighbors != size in your SDK example, here is an exerpt: // Perform the vector similarity search var searchOptions = new SearchOptions { Vectors = { new() { Value = queryEmbeddings.ToArray(), KNearestNeighborsCount = 3, Fields = { "contentVector" } } }, Size = 10, QueryType = SearchQueryType.Semantic, QueryLanguage = QueryLanguage.EnUs, SemanticConfigurationName = SemanticSearchConfigName, QueryCaption = QueryCaptionType.Extractive, QueryAnswer = QueryAnswerType.Extractive, QueryCaptionHighlightEnabled = true, Select = { "title", "content", "category" }, };

Nearest neighbors should take each result, and grab the index neighbors of that result. EG> if my result id == 5, then a NN = 2 should retrieve ids: 3, 4, 5, 6, 7 or 2 above and 2 below the returned index. The "Size" parameter is then how many search matches it hits on. With text chunking, a single document could be 100 chunks, and the top related vector may only be a sentence or two depending on chunking techniques. Nearest neighbors parameter in other vector db's like pinecone or weaviate return the surrounding indexes in order to provide a rag pattern more cohesive data to formulate its response with.

Does this make sense or if the NN in ACS not work in this manner as I described?

Mano1192 avatar Sep 08 '23 22:09 Mano1192