semantic-kernel icon indicating copy to clipboard operation
semantic-kernel copied to clipboard

.Net: Python: API for listing/filtering records without similarity search

Open tlecomte opened this issue 1 year ago • 4 comments

Describe the bug IVectorStoreRecordCollection does not have a method to list all the keys stored in the collection. So we are missing a way to compare the content of the collection with the source to find what keys are to be deleted (when they are no longer in the source data).

To Reproduce Steps to reproduce the behavior:

  1. Go to https://github.com/microsoft/semantic-kernel/blob/main/dotnet/src/Connectors/VectorData.Abstractions/VectorStorage/IVectorStoreRecordCollection.cs
  2. Observe that there are methods to get by key, delete by key, upsert by key, but no method to list all keys

Expected behavior We would expect to have something like IVectorStoreRecordCollection.ListAsync. That would allow to find what keys need to be deleted.

Screenshots N/A

Platform

  • OS: all
  • IDE: N/A
  • Language: C#
  • Source: main branch of repository

Additional context N/A

tlecomte avatar Dec 05 '24 18:12 tlecomte

We should also consider being able to control the sort order as part of this. To reliably page through an entire dataset, being able to control the sort order is valuable, so that sorting can be done on a field that will not change, causing record to be missed. This may of course not be supported by all VectorDBs so some analysis is required to validate whether this is feasible.

westey-m avatar Jan 27 '25 15:01 westey-m

Seems like we also have #10295 tracking the same thing - am proposing we use this to track the Python side of the feature, and #10295 to track the .NET side.

roji avatar Mar 10 '25 08:03 roji

we also have this already for python: #9911

eavanvalkenburg avatar Mar 10 '25 10:03 eavanvalkenburg

@eavanvalkenburg ah, so maybe this can be closed?

roji avatar Mar 13 '25 17:03 roji