Azure cognitive search improvements and issues
NOTE: to test and use these issues I have created a local copy of the Connectors.Memory.AzureCognitiveSearch project.
Issue 1 AzureCognitiveSearchRecord class needs to have not mandatory the IsReference field (should be nullable bool (bool?) and not bool). In this way, we are able to create indexer, skillset and index that can be build by the service itself, adding sharper skill to build a json dictionary with EdditionalMetadata property.
In this case, when we set a new istance of MemoryRecordMetadata we need to modify line 320 in AzureCognitiveSearchMemory.cs in this way: isReference: data.IsReference.GetValueOrDefault(false),
Issue 2 In order to create indexers and index in CS service, in the SearchAsync method, the Memory performs a semantic search but retrieves only the text that should be mapped to the document/content index field and not the captions that are very important to dynamically build dynamic prompts for OpenAI.
In my code I have decided to concatenate CaptionResults list to a string and set the result to the description property.
Below you can see the code change for ToMemoryRecordMetadata method.
private static MemoryRecordMetadata ToMemoryRecordMetadata(AzureCognitiveSearchRecord data, IList<CaptionResult> captions = null) { **string captionstr = null;
if (captions != null)
{
var list = from caption in captions
select caption.Text;
captionstr = string.Join(" ", list.ToArray());
}**
return new MemoryRecordMetadata(
isReference: data.IsReference.GetValueOrDefault(false),
id: DecodeId(data.Id),
text: data.Text ?? string.Empty,
**description: data.Description ?? captionstr ?? string.Empty,**
externalSourceName: data.ExternalSourceName,
additionalMetadata: data.AdditionalMetadata ?? string.Empty);
}
After this the SearchAsync method needs to be modified in this way:
public async IAsyncEnumerable<MemoryQueryResult> SearchAsync( string collection, string query, int limit = 1, double minRelevanceScore = 0.7, bool withEmbeddings = false, [EnumeratorCancellation] CancellationToken cancellationToken = default) { collection = NormalizeIndexName(collection);
var client = this.GetSearchClient(collection);
// TODO: use vectors
var options = new SearchOptions
{
QueryType = SearchQueryType.Semantic,
SemanticConfigurationName = "default",
QueryLanguage = "en-us",
Size = limit,
**QueryCaption = QueryCaptionType.Extractive,**
};
Response<SearchResults<AzureCognitiveSearchRecord>>? searchResult = null;
try
{
searchResult = await client
.SearchAsync<AzureCognitiveSearchRecord>(query, options, cancellationToken: cancellationToken)
.ConfigureAwait(false);
}
catch (RequestFailedException e) when (e.Status == 404)
{
// Index not found, no data to return
}
if (searchResult != null)
{
await foreach (SearchResult<AzureCognitiveSearchRecord>? doc in searchResult.Value.GetResultsAsync())
{
if (doc.RerankerScore < minRelevanceScore) { break; }
yield return new MemoryQueryResult(**ToMemoryRecordMetadata(doc.Document,doc.Captions)**, doc.RerankerScore ?? 1, null);
}
}
}
@dluc will take a look
Thanks for the feedback, yes these are knows issues of IMemoryStore implementations in the kernel repo. My recommendation is moving to our new Memory solution which will address all the pain points, offer new features and allow to decouple AI Orchestration from AI State. See https://github.com/microsoft/semantic-memory, here's the steps to get started:
- Familiarize with the component, e.g. use the ServerLess memory for an easy tour
- Deploy the memory as a web service, as an internal service not exposed to public
- Leverage the memory web service OpenAPI swagger to consume memory as a plugin
All .Net issues prior to 1-Dec-2023 are being closed. Please re-open, if this issue is still relevant to the .Net Semantic Kernel 1.x release. In the future all issues that are inactive for more than 90 days will be labelled as 'stale' and closed 14 days later.