rig icon indicating copy to clipboard operation
rig copied to clipboard

refactor: Add generic type to `VectorStoreIndex` trait

Open cvauclair opened this issue 4 months ago • 0 comments

  • [x] I have looked for existing issues (including closed) about this

Feature Request

Refactor the VectorStoreIndex trait to add a generic type representing the type documents stored in the store. This would remove the generic type of the top_n method.

Motivation

This goal of this change is to improve the developer experience while working with vector stores. Specifically, it solves the problem where developers have to define the type associated with a vector store twice. For instance, with the InMemoryVectorStore, which is itself already parametrized by some generic type D, the type T of the top_n implementation cannot be inferred where it should in fact be the same as the type D of the store! A similar situation occurs with the MongoDbVectorStore, which takes as constructor argument a Collection<T>, which implies that the return type of the top_n method is also T (currently you have to define it twice).

Proposal

Refactor the VectorStoreIndex trait like so:

pub trait VectorStoreIndex<T: for<'a> Deserialize<'a>>: Send + Sync {
    /// Get the top n documents based on the distance to the given query.
    /// The result is a list of tuples of the form (score, id, document)
    fn top_n(
        &self,
        query: &str,
        n: usize,
    ) -> impl std::future::Future<Output = Result<Vec<(f64, String, T)>, VectorStoreError>> + Send;

    /// Same as `top_n` but returns the document ids only.
    fn top_n_ids(
        &self,
        query: &str,
        n: usize,
    ) -> impl std::future::Future<Output = Result<Vec<(f64, String)>, VectorStoreError>> + Send;
}

cvauclair avatar Oct 15 '24 16:10 cvauclair