kernel-memory icon indicating copy to clipboard operation
kernel-memory copied to clipboard

Azure Cognitive Search "Hybrid Search + Semantic" Search - Is it supported?

Open slm-2015 opened this issue 2 years ago • 16 comments

Hi,

Can I use Azure Search "Hybrid Search + Semantic" Search using Kernel Memory? Or is only vector search possible?

Just looking at this article: https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-cognitive-search-outperforming-vector-search-with-hybrid/ba-p/3929167 ; it suggests that using this approach would provide better relevance in the returned memories.

Any pointers welcome.

Thanks, Simon.

slm-2015 avatar Nov 16 '23 09:11 slm-2015

Hi Simon, the feature is not available yet, but definitely on our radar, high priority I would say.

dluc avatar Nov 17 '23 00:11 dluc

Quick update: the IMemoryDbinterface used by all memory connectors now allows to see the user input text query, so it's possible to write a custom connector using hybrid search and custom features.

dluc avatar Nov 30 '23 05:11 dluc

Couldn't AzureAISearchMemory be updated to support hybrid search? The appsettings could be updated to control the mode (hybrid or simple vector): AzureAISearchConfig. Since kernel-memory owns the schema definition, the search fields should be known.

crickman avatar Mar 06 '24 18:03 crickman

Just chiming in here to emphasize how important is to support hybrid (and semantic) search in the library, as that would improve the RAG process a lot... 😄

luismanez avatar Mar 09 '24 09:03 luismanez

Thinking a bit more about it, what if, as a quick win, we add a flag in the AzureAISearchConfig class:

public bool UseHybridSearch { get; set; } = false;

When using DI, we can configure that flag when calling AddAzureAISearchAsMemoryDb(this IServiceCollection services, AzureAISearchConfig config)

Later, in the AzureAISearchMemory class, we store the config object in a private property:

private readonly AzureAISearchConfig _config;

// in the constructor
public AzureAISearchMemory(
        AzureAISearchConfig config,
        ITextEmbeddingGenerator embeddingGenerator,
        ILogger<AzureAISearchMemory>? log = null)
    {
        this._config = config;

And finally, in the GetSimilarListAsync, we check that config flag, and when using Hybrid, the query text is passed as keyword to the SearchClient:

try
        {
            var keyword = this._config.UseHybridSearch ? text : null;
            searchResult = await client
                .SearchAsync<AzureAISearchMemoryRecord>(keyword, options, cancellationToken: cancellationToken)
                .ConfigureAwait(false);
        }

As far as I've seen in the Az Search SDK, having the Keyword and the Vector option, should be enough to run the Hybrid search, and as the score is also returned in the same @search.score property, the rest of the process shouldn't change.

@dluc if you think I'm right, I'd do the PR 😄

luismanez avatar Mar 09 '24 16:03 luismanez

@dluc (or team), Any thoughts here? would be very nice to have (at least) Hybrid search in KM, and if I'm not too wrong, does not seem like a hard one to have.

Many thanks.

luismanez avatar Mar 20 '24 19:03 luismanez

@luismanez that looks good to me, nothing against :-) I was checking Azure AI Search config to see how complex indexes can be, and there are really a lot of permutations.

I'm assuming this change would not require changes to the index schema? KM creates indexes automatically, so I'm not sure if your proposal should include changes to the index creation code, and how complex that could be.

dluc avatar Mar 20 '24 20:03 dluc

OK, great!, thanks. I'll do the PR as soon as possible.

This does not require changes to the index schema. We are just enabling Hybrid search, and as far as I understand from here: https://learn.microsoft.com/en-us/azure/search/hybrid-search-how-to-query#hybrid-query-request we just need to provide the vector and search parameters to run the Hybrid query (currently only the vector is provided).

In the (short? 😄 ) future, would be great to allow also Semantic search, but that would require changes in the Index schema, and would be great to have the ability to provide a custom Schema...

luismanez avatar Mar 21 '24 12:03 luismanez

OK, great!, thanks. I'll do the PR as soon as possible.

This does not require changes to the index schema. We are just enabling Hybrid search, and as far as I understand from here: https://learn.microsoft.com/en-us/azure/search/hybrid-search-how-to-query#hybrid-query-request we just need to provide the vector and search parameters to run the Hybrid query (currently only the vector is provided).

In the (short? 😄 ) future, would be great to allow also Semantic search, but that would require changes in the Index schema, and would be great to have the ability to provide a custom Schema...

Enabling Semantic Search would be fantastic!

dluc avatar Mar 21 '24 14:03 dluc

@dluc any plans to allow custom search Schema? cos adding semantic search wouldn't be that hard in code, as I think we could take same approach I describe for Hybrid. However, the advantage of semantic search is its configuration, and linking to specific index fields. With our current tags and payload fields, won't work very well (all IMO and AFAIK)

luismanez avatar Mar 21 '24 15:03 luismanez

@dluc any plans to allow custom search Schema? cos adding semantic search wouldn't be that hard in code, as I think we could take same approach I describe for Hybrid. However, the advantage of semantic search is its configuration, and linking to specific index fields. With our current tags and payload fields, won't work very well (all IMO and AFAIK)

I looked into it a couple of times, I think we just don't have the time to implement it right now, sorry.

dluc avatar Mar 21 '24 21:03 dluc

@dluc I'm coding the Hybrid support but found an issue and need your thoughts. I've realised that Hybrid search returns a pretty low Score values. Apparently this is by design, due to the Reciprocal Rank Fusion (RRF) algorithm. I found this doc: https://learn.microsoft.com/en-us/azure/search/hybrid-search-how-to-query#ranking

The different ranking algorithms, HNSW's similarity metric and RRF is this case, produce scores that have different magnitudes. This behavior is by design. RRF scores can appear quite low, even with a high similarity match. Lower scores are a characteristic of the RRF algorithm. In a hybrid query with RRF, more of the reciprocal of the ranked documents are included in the results, given the relatively smaller score of the RRF ranked documents, as opposed to pure vector search.

With this, when we are processing each result and calculating the min distance using CosineSimilarity none of the returned results are passing the "filter" (note that even with a very relevant result item, its score is just 0.03, so never is greater thatn the default 0.5 minDistance):

var minDistance = CosineSimilarityToScore(minRelevance);
await foreach (SearchResult<AzureAISearchMemoryRecord>? doc in searchResult.Value.GetResultsAsync().ConfigureAwait(false))

With Hybrid, we shouldn't use the CosineSimilarity, but I don't know if there is any other filter we want to apply, or is just a matter of not calculating the minDistance and not doing this:

if (doc == null || doc.Score < minDistance) { continue; }

So, when Hybrid, shall we just remove the IF, and just return the same Score is returned by Azure Search? instead of:

        var minDistance = CosineSimilarityToScore(minRelevance);
        await foreach (SearchResult<AzureAISearchMemoryRecord>? doc in searchResult.Value.GetResultsAsync().ConfigureAwait(false))
        {
            if (doc == null || doc.Score < minDistance) { continue; }

            MemoryRecord memoryRecord = doc.Document.ToMemoryRecord(withEmbeddings);

            yield return (memoryRecord, ScoreToCosineSimilarity(doc.Score ?? 0));
        }

we do:

var minDistance = CosineSimilarityToScore(minRelevance);
        await foreach (SearchResult<AzureAISearchMemoryRecord>? doc in searchResult.Value.GetResultsAsync().ConfigureAwait(false))
        {
            if (!this._config.UseHybridSearch && (doc == null || doc.Score < minDistance)) { continue; }

            MemoryRecord memoryRecord = doc.Document.ToMemoryRecord(withEmbeddings);

            var score = doc.Score ?? 0;
            score = this._config.UseHybridSearch ? score : ScoreToCosineSimilarity(score);
            yield return (memoryRecord, score);
        }

What do you think?

UPDATE: the problem with this approach, is that we have a k parameter for Vector queries, but we do NOT have a top parameter for Hybrid, and I think we need that TOP. Otherwise, Azure Search is returning 1000 items!! https://learn.microsoft.com/en-us/azure/search/hybrid-search-how-to-query#number-of-results

luismanez avatar Mar 26 '24 21:03 luismanez

Any thoughts here @dluc ? sorry to bother, but I think would be pretty interesting to support at least Hybrid search. Thanks.

luismanez avatar Apr 01 '24 19:04 luismanez

With Hybrid, we shouldn't use the CosineSimilarity

that's a big change.. the user needs to be aware that relevance is very different, otherwise everyone will complain that they are getting zero results when using the usual thresholds like 0.5 or 0.7

I think hybrid search should be explicitly enabled in the configuration, and somehow inform the user.

So, when Hybrid, shall we just remove the IF, and just return the same Score is returned by Azure Search?

only when hybrid. In this case we might as well remove CosineSimilarityToScore(minRelevance); and do minDistance = minRelevance

dluc avatar Apr 08 '24 15:04 dluc

Any news on this? This is a much needed feature!

heidarj avatar Apr 20 '24 13:04 heidarj

@dluc I've sent a PR to add Azure AI search Hybrid search support to the library (only Hybrid, Semantic support is much more complex). https://github.com/microsoft/kernel-memory/pull/428

The idea is to have a new Config property in the AzureAISearchConfig class, so Hybrid is only enabled explicitly. When enabled, the CosineSimilarity is not calculated and the minDistance is set to the minRelevance parameter (passed from the top SearchAsync method).

Also note that the Document score, when using Hybrid, is set to the Score value returned directly from Azure AI Search.

var documentScore = this._config.UseHybridSearch ? doc.Score ?? 0 : ScoreToCosineSimilarity(doc.Score ?? 0);
yield return (memoryRecord, documentScore);

Please have a look and let me know if any change is required. Thanks!

/cc @heidarj

luismanez avatar Apr 23 '24 20:04 luismanez