spring-ai icon indicating copy to clipboard operation
spring-ai copied to clipboard

Optimize Request for Embedding in Vector Store

Open ricken07 opened this issue 1 year ago • 2 comments
trafficstars

Currently, vector store automatically calls the embedding client to generate the document embedding without checking whether the document already had an embedding.

In this PR, I first check if the document doesn't already have an embedding before calling the client to generate an embedding. This prevents too many calls to generate an embedding.

  • Tests are green for impacted vector stores

ricken07 avatar May 13 '24 15:05 ricken07

If not mistaken this is the same or related to https://github.com/spring-projects/spring-ai/pull/413 ?

But this change comes with some risks. For example, it is not clear when one would have to invalidate the pre-computed embedding (e.g. the index). Likely when Also I'm not sure how useful this feature would be. What is the use case where you will use repeatedly the same Documents (with pre-computed embeddings) for searching? Or what are the reasons you might what to re-add a document that has precomputed embedding?

Maybe I'm missing some interesting use cases?

Right now we do not allow the Vector Store to use other embeddings but those computed by the embedding-model registered with the VectorStore. Using the embedding field would allow one to pre-compute the embeddings externally using different embedding-model and then the VectorStore will store the document with the externally computed embedding. But I'm not sure if this is a real or needed use case, nor if this is the right approach to support it.

tzolov avatar Aug 21 '24 05:08 tzolov

If the pre-computed embeddings are not applicable/useful for real use cases, IMO, we should remove the embedding field from the Document class.

tzolov avatar Aug 21 '24 05:08 tzolov