[Question]:How to implement multi tenant solutions?
Do you need to ask a question?
- [x] I have searched the existing question and discussions and this question is not already answered.
- [ ] I believe this is a legitimate question, not just a bug or feature request.
Your Question
I'm exploring using LightRAG + POSTGRES for storage in a multi-tenant application and am running into some challenges with managing data isolation between tenants. What I've Considered I noticed the workspace isolation option, which seems useful for defining different environments. However, it feels difficult to dynamically switch this configuration at runtime for each incoming API request. The persistence of the connection pools seems to prevent a clean switch from one tenant's database/schema to another's on a per-request basis.
My Question Is there a recommended or existing way to manage a multi-tenant setup with LightRAG that I might be missing?
For example, is there a way to:
Dynamically pass tenant-specific database postgres credentials/configurations to the LightRAG or other components for each operation? OR use a single database with workspace isolations that can easily be modified during run time...
Any guidance or best practices you could share on this would be greatly appreciated. Thank you for the fantastic library!
Additional Context
No response
+1 on this. Would love to know the recommended approach here.
We've encountered the same challenge.
In the method await ClientManager.get_client()
@classmethod
async def get_client(cls) -> PostgreSQLDB:
async with cls._lock:
if cls._instances["db"] is None:
config = ClientManager.get_config()
db = PostgreSQLDB(config)
await db.initdb()
await db.check_tables()
cls._instances["db"] = db
cls._instances["ref_count"] = 0
cls._instances["ref_count"] += 1
return cls._instances["db"]
This is a global parameter, it will always return the same workspace each time selecting from the DB.
In the postgres_impl.py, suggest the workspace parameter to accept the programmatically passed-in function argument instead of taking the class attribute (which will never be changed once initialized).
This will help a lot if using the LightRAG core for multi-tenant solution.
I have no idea why the DB workspace takes higher priority:
async def initialize(self):
if self.db is None:
self.db = await ClientManager.get_client()
# Implement workspace priority: PostgreSQLDB.workspace > self.workspace > "default"
if self.db.workspace:
# Use PostgreSQLDB's workspace (highest priority)
final_workspace = self.db.workspace
elif hasattr(self, "workspace") and self.workspace:
# Use storage class's workspace (medium priority)
final_workspace = self.workspace
self.db.workspace = final_workspace
else:
# Use "default" for compatibility (lowest priority)
final_workspace = "default"
self.db.workspace = final_workspace
And the DB object is singleton instance
+1
I have managed to solve the issue by overriding the relevant storage classes 👍
I have managed to solve the issue by overriding the relevant storage classes 👍
Thats's Great! When i tried overriding, i noticed all the storage classes come with a @final decorator which makes it impossible to override unless you patch or change the library code itself. Am I missing something here or can you explain how you managed to override?
I have managed to solve the issue by overriding the relevant storage classes 👍
Thats's Great! When i tried overriding, i noticed all the storage classes come with a @Final decorator which makes it impossible to override unless you patch or change the library code itself. Am I missing something here or can you explain how you managed to override?
Sorry for the wrong use of term "override", you can create your own storage classes based on the PG imlementation, and replace those internal storage classes with your own in the light rag instance.
Ahh, I see! I was able to tinker with the ClientManager class to fix this. But, however this seems to be an issue with the library that has some easy workarounds.
- Unless there is a special reason, the database credentials should not be read from the env variables.
- ClientManager or the storage classes must be able to take workspace as params as the db instance is instantiated, ensuring there are no race conditions.
+1
I'm looking at various RAG solutions right now, and the choice to make you have to run multiple server instances in order to separate workspaces is a very strange design decision to me. Having to setup a new instance every time someone wants to make a new workspace isn't viable when you want users to be able to create/delete an arbitrary number of isolated workspaces.
I'd much prefer something like ChromaDB's solution of just having a "collection" name that you can pass as a parameter to the API, and all collections are isolated from each other without having to run a new server instance for every new workspace.
But isn't that Chromadb's storage is only for the vector embeddings? Btw, You are absolutely right. Having separate instances for each tenant is not at all scalable. Having one instance with some patching applied in the ClientManager class will enable you to use the workspace parameter efficiently to isolate data. But since the data i am dealing with is extremely critical and i cannot afford to have any mishaps with mixing of data, i wanted to give it some time to see the performance and results to maintain a single instance with proper workspace isolation.
hi, I also encountered this problem, how to solve this problem, can you share the code :)
Hope you can provide a code example, or submit a PR
I found the mention of working_dir mentioned here
https://github.com/HKUDS/LightRAG/issues/310
but I don't find much about its caveat or what it actually does