LightRAG icon indicating copy to clipboard operation
LightRAG copied to clipboard

What is the best storage solution currently available?

Open ZhuLinsen opened this issue 9 months ago • 3 comments

Do you need to ask a question?

  • [x] I have searched the existing question and discussions and this question is not already answered.
  • [x] I believe this is a legitimate question, not just a bug or feature request.

Your Question

  1. There is a need for multiple knowledge bases. The Neo4j Community Edition currently cannot create new databases. Does it support multiple knowledge databases?
  2. PostgreSQL does not support namespace differentiation. https://github.com/HKUDS/LightRAG/issues/814
  3. The performance of the JSON storage method is too poor. So, what is the best storage solution currently available that supports multiple knowledge bases? Thank you for your reply.

Additional Context

No response

ZhuLinsen avatar Mar 12 '25 12:03 ZhuLinsen

You have the capability to launch multiple LightRAG instances by utilizing different .env files within your startup directories. LightRAG is scheduled to developing a Workspace feature, which is slated for release in the near future. We encourage you to join the discussion on this topic in our dedicated thread #1016.

danielaskdd avatar Mar 12 '25 14:03 danielaskdd

This doesnt answer the question i think. The author wants to know if there is a graph_storage, kv_storage combination which works atm.

reqyou avatar Mar 13 '25 15:03 reqyou

I have played around a bit with various options of storage. I have not done any formal measurements but here are my observations.

Graphstorage selection significantly impacts the performance of the solution.

  1. Postgres docker(the docker image that Shangor has prepared) but it was too slow, I saw bottlebeck in PGVector.
  2. Azure Postgres Flex server with PGVector for Graphstorage - Same results
  3. Azure Postgres Flex + Neo4DB Community edition - Better performance but significant pool thread issues.
  4. Azure Postgres Flex + NetworkX - Best performance with 5000 nodes + 10000 edges - But needs serious memory and graphml file mounts
  5. Azure Postgres Flex + Memgraph - Had to reengineer neo4J cypher to opencypher - Better than Neo4J
  6. MongoDB Atlas for every storage - This is looking very promising. Very simple to set up and manage.

I am thinking of trying out Gremlin next.

If anyone has any insights, please share.

acsangamnerkar avatar Mar 17 '25 02:03 acsangamnerkar

Hi, Any updates on this ? I was able to run queries easily within 3-4 seconds with PGVector + Age when I loaded 2 documents only. However when the size of my documents increased to about 30, my CPU usage was becoming 100% and I was getting no response. The KG size is about 10k Nodes and 20k edges.

khizarhussain19 avatar Apr 21 '25 08:04 khizarhussain19