kernel-memory [Feature Request] Cache and manage the embeddings in a persistent storage

Context / Scenario

This post is to dive deeper into this PR for the related topic: https://github.com/microsoft/kernel-memory/pull/389

The problem

The problem is simple: we want to avoid calling the embedding API as much as possible since it is often slow and expensive. One quick and cheap solution is to cache the embeddings by the content hash and see if there is any chance for the collision to happen when feeding the KM with a large documentation or multiple ones with repeated content (that's all above PR is all about).

BUT, I don't think this is an ideal solution for real world scenarios. Why? Because:

We don't get repeated text or paragraphs often in most of the cases.
Above PR only benefits in the scope of current document(s) ingestion.

Let's skip the first one and go straight into the second scenario:

There are lots of cases where we want to update the existing document(s) or re-ingest them as content getting refreshed or updated, either it is a text document or a web page. In both cases, most of the content remain the same but embedding will happen again and again even if you re-import them using the same document id. This is a scenario I believe where a persistent embedding cache storage is needed for improving the speed and reducing the cost of continuously ingested documents.

Proposed solution

In addition to the FileStorageDb and MemoryDb for the vectors and text, we could have another abstraction + implementation for the EmbeddingsCacheDb where it can be configured and used by the GenerateEmbeddingsHandler to avoid re-generating the embeddings for the same partitioned content over time across workers. Ideally storing the content hash in a distributed cache storage like Redis and storing the associated embeddings in a blob storage to work across multiple workers.

We might just need to re-design or update the way how we store the embeddings to make sure it is easy to find if the embedding already exists for the given content hash, so we don't need to store them twice. Ideally just an additional hash mapping of the two is needed or maybe we include the hash in the entity name itself etc.

User should be able to:

Customize the storage type and location of this cache.
Control the behavior of this cache thru config (a maximum storage limit etc).
Violate the cache by certain policy (Ex: all embeddings cache associated with a given document should be removed when the document is deleted with a specified document id or Index)

Importance

would be great to have

Apr 01 '24 22:04 0x7c13

Posting here some notes from the PR:

KM uses multiple embedding generators, so it's important not to consider only the content, but also the generator used and the underlying configuration, e.g. which model.
Cache should persist across reboots to provide some benefits, and should be shared over the network in cases where KM runs on multiple nodes
As a persistence layer I would consider reusing the available Content Storage, which is itself configurable to store data on disk, Azure blobs, MongoDb

In an early SK prototype I used to cache embedding in the underlying http layer of the embedding generator, so I could use as a cache key the AI provider (e.g. OpenAI endpoint), the AI model name and other params contributing to uniqueness. These params are available more easily in the generator, rather than in code calling the generator.

My recommendation would be integrating a cache behavior inside the generators, rather than caching in each client calling an embedding generator.

Apr 08 '24 21:04 dluc

Looks like the PR has become stale, with a few things to address.

If this is a pressing problem, the approach should be reusable (e.g. not having to add caching logic in every handler - usually caching is a cross-cutting concern solved with generic KV stores decoupled from specific scenarios), scale over multiple VMs (e.g. allow to extend the solution with Redis/Memcache), and being optional via config settings.

May 17 '24 02:05 dluc