Vector Suppport for locally running Laguage Models embedding support
I wanted to bring this up again because no discussion has developed since 2023 and we are making huge leaps in directions where locally running language models are becoming more common.
https://github.com/litedb-org/LiteDB/issues/2364
It is currently very hard to find a locally running database that supports vector storage queries. Actually, storage is not the problem.
For example, storing an n-dimensional vector in SQLite is not the problem.
In my case, I create an embedding with https://github.com/dotnet/smartcomponents. But the problem is the search. There is no native way to use cosine similarity or nearest neighbor for vectors stored in SQLite without using an extension like sqlite-vec (https://github.com/asg017/sqlite-vec). So it is necessary to query all vecotors and doing the similarity search manually
In .Net there are already some implementations for vector storage in memory like https://github.com/Build5Nines/SharpVector but nothing locally persistent and here a native implementation for LiteDB would be great!
Initially it would be enough for me to just add a ByteArray or a FloatArray and have similarity and NN search function on it.
Features are currently mostly submitted by the community. If you want to open a pr for that, i would be happy to accept it :)
@JKamsker
Thanks, I'll have to check out the source.
Do you have any recommendations on where I should start?
@Agredo, hello I would like to create (extend) an app for RAG in C# Do you know where can I get it (runnable small solution for extending) I have many (100k+) docx/xlsx/pdf in html My goal is to replace search via Lucene for AI search I use LiteDB as storage
Of course I think I will create something for vectors, maybe at least because Once I needed to play with gdpr in LiteDB as storage, I created some nugets for that https://www.nuget.org/packages/Lpd (main)
@Agredo, hello I would like to create (extend) an app for RAG in C# Do you know where can I get it (runnable small solution for extending) I have many (100k+) docx/xlsx/pdf in html My goal is to replace search via Lucene for AI search I use LiteDB as storage
Of course I think I will create something for vectors, maybe at least because Once I needed to play with gdpr in LiteDB as storage, I created some nugets for that https://www.nuget.org/packages/Lpd (main)
Hey @qart2003,
maybe this helps you:
https://jamiemaguire.net/index.php/2024/09/01/semantic-kernel-implementing-100-local-rag-using-phi-3-with-local-embeddings/
You should take a look into Microsoft.extension.AI and for example an Transformer ONNX embedding model like https://huggingface.co/SmartComponents/bge-micro-v2
or https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
@Agredo thank you
working example not found :)
Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: "Error opening c:\00\onnx\models\local\phi3-mini-4k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx.data\genai_config.json"
I cant have in the folder file and directory with equal names, it's why impossible to use example Of course maybe I dont know how to setup it
Next lines as is
var modelPath = @"c:\00\onnx\models\local\phi3-mini-4k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx.data"; var modelId = "localphi3onnx";
var textModelPath = @"c:\00\onnx\models\local\model_q.onnx"; var foo = @"c:\00\onnx\models\local\vocab.txt";
// Load the model and services var builder = Kernel.CreateBuilder(); builder.AddOnnxRuntimeGenAIChatCompletion(modelId, modelPath); //builder.AddOnnxRuntimeGenAIChatCompletion(modelPath); //builder.AddBertOnnxTextEmbeddingGeneration(modelId, modelPath); //builder.AddLocalTextEmbeddingGeneration(); builder.AddBertOnnxTextEmbeddingGeneration(textModelPath, foo); // Build Kernel var kernel = builder.Build();
Implemented in https://github.com/litedb-org/LiteDB/releases/tag/v6.0.0-prerelease.0052 - go check it out :)