LiteDB icon indicating copy to clipboard operation
LiteDB copied to clipboard

Vector Suppport for locally running Laguage Models embedding support

Open Agredo opened this issue 11 months ago • 2 comments

I wanted to bring this up again because no discussion has developed since 2023 and we are making huge leaps in directions where locally running language models are becoming more common.

https://github.com/litedb-org/LiteDB/issues/2364

It is currently very hard to find a locally running database that supports vector storage queries. Actually, storage is not the problem.

For example, storing an n-dimensional vector in SQLite is not the problem.

In my case, I create an embedding with https://github.com/dotnet/smartcomponents. But the problem is the search. There is no native way to use cosine similarity or nearest neighbor for vectors stored in SQLite without using an extension like sqlite-vec (https://github.com/asg017/sqlite-vec). So it is necessary to query all vecotors and doing the similarity search manually

In .Net there are already some implementations for vector storage in memory like https://github.com/Build5Nines/SharpVector but nothing locally persistent and here a native implementation for LiteDB would be great!

Initially it would be enough for me to just add a ByteArray or a FloatArray and have similarity and NN search function on it.

Agredo avatar Jan 11 '25 14:01 Agredo

Features are currently mostly submitted by the community. If you want to open a pr for that, i would be happy to accept it :)

JKamsker avatar Jan 11 '25 16:01 JKamsker

@JKamsker

Thanks, I'll have to check out the source.

Do you have any recommendations on where I should start?

Agredo avatar Jan 14 '25 00:01 Agredo

@Agredo, hello I would like to create (extend) an app for RAG in C# Do you know where can I get it (runnable small solution for extending) I have many (100k+) docx/xlsx/pdf in html My goal is to replace search via Lucene for AI search I use LiteDB as storage

Of course I think I will create something for vectors, maybe at least because Once I needed to play with gdpr in LiteDB as storage, I created some nugets for that https://www.nuget.org/packages/Lpd (main)

qart2003 avatar Aug 01 '25 00:08 qart2003

@Agredo, hello I would like to create (extend) an app for RAG in C# Do you know where can I get it (runnable small solution for extending) I have many (100k+) docx/xlsx/pdf in html My goal is to replace search via Lucene for AI search I use LiteDB as storage

Of course I think I will create something for vectors, maybe at least because Once I needed to play with gdpr in LiteDB as storage, I created some nugets for that https://www.nuget.org/packages/Lpd (main)

Hey @qart2003,

maybe this helps you:

https://jamiemaguire.net/index.php/2024/09/01/semantic-kernel-implementing-100-local-rag-using-phi-3-with-local-embeddings/

You should take a look into Microsoft.extension.AI and for example an Transformer ONNX embedding model like https://huggingface.co/SmartComponents/bge-micro-v2

or https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

Agredo avatar Aug 04 '25 20:08 Agredo

@Agredo thank you

working example not found :)

Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: "Error opening c:\00\onnx\models\local\phi3-mini-4k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx.data\genai_config.json"

I cant have in the folder file and directory with equal names, it's why impossible to use example Of course maybe I dont know how to setup it

Next lines as is

var modelPath = @"c:\00\onnx\models\local\phi3-mini-4k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx.data"; var modelId = "localphi3onnx";

var textModelPath = @"c:\00\onnx\models\local\model_q.onnx"; var foo = @"c:\00\onnx\models\local\vocab.txt";

// Load the model and services var builder = Kernel.CreateBuilder(); builder.AddOnnxRuntimeGenAIChatCompletion(modelId, modelPath); //builder.AddOnnxRuntimeGenAIChatCompletion(modelPath); //builder.AddBertOnnxTextEmbeddingGeneration(modelId, modelPath); //builder.AddLocalTextEmbeddingGeneration(); builder.AddBertOnnxTextEmbeddingGeneration(textModelPath, foo); // Build Kernel var kernel = builder.Build();

qart2003 avatar Aug 07 '25 05:08 qart2003

Implemented in https://github.com/litedb-org/LiteDB/releases/tag/v6.0.0-prerelease.0052 - go check it out :)

JKamsker avatar Oct 06 '25 07:10 JKamsker