extensions icon indicating copy to clipboard operation
extensions copied to clipboard

[MEDI] Allow extending VectorStoreWriter

Open roji opened this issue 2 months ago • 2 comments

VectorStoreWriter is currently sealed, but there are some good reasons to allow specialized implementations for specific databases, which would extend it. Such specialized implementations could be delivered with MEVD provider; the e.g. Qdrant MEVD provider package and provide QdrantStoreWriter (though we'd have to be OK with the added reference to MEDI.Abstractions).

Some reasons/motivations to specialize:

  • Work around provider-specific limitations. For example, we have a (temporary) hack to identify Qdrant, where we use GUIDs (as opposed to strings in all other databases). We'd instead do that in QdrantVectorWriter (the alternative is to expose type support metadata from MEVD itself, issue).
  • Different databases like different kinds of GUIDs, and index them much better (though see #12182 and #11485 as better alternatives):
    • Generate UUIDv7 for PostgreSQL, where they're much more efficient for indexing than random UUIDv4
    • SQL Server has its own special "sequential GUIDs" (link).
  • Account fo the different support Top supported by different databases (see this)

Of course, unsealing VectorStoreWriter isn't enough - we'd need to expose selected protected APIs to actually make it useful. An alternative is for specialized implementations to simply extend VectorStoreWriter and duplicate code, but there seems to be enough actual logic in there to justify extensibility, I think.

roji avatar Oct 27 '25 17:10 roji

An alternative is for specialized implementations to simply extend VectorStoreWriter and duplicate code, but there seems to be enough actual logic in there to justify extensibility, I think.

I am sharing an example of how this can be achieved: https://github.com/adamsitnik/dataingestion/blob/5e46702aac10c2d535989e88e10986880150fe8e/src/Samples/FAQ.cs#L10-L71

adamsitnik avatar Oct 28 '25 16:10 adamsitnik

Sure - that's if someone wants to do their own custom implementation of a VectorStoreCollection wrapper, which may be fine in most cases. But depending on how much logic we end up packing into VectorStoreWriter, it may be useful to be able to extend it etc.

roji avatar Oct 28 '25 17:10 roji