LightRAG icon indicating copy to clipboard operation
LightRAG copied to clipboard

[Question]:How to implement multi tenant solutions?

Open harish-trench opened this issue 3 months ago • 12 comments

Do you need to ask a question?

  • [x] I have searched the existing question and discussions and this question is not already answered.
  • [ ] I believe this is a legitimate question, not just a bug or feature request.

Your Question

I'm exploring using LightRAG + POSTGRES for storage in a multi-tenant application and am running into some challenges with managing data isolation between tenants. What I've Considered I noticed the workspace isolation option, which seems useful for defining different environments. However, it feels difficult to dynamically switch this configuration at runtime for each incoming API request. The persistence of the connection pools seems to prevent a clean switch from one tenant's database/schema to another's on a per-request basis.

My Question Is there a recommended or existing way to manage a multi-tenant setup with LightRAG that I might be missing?

For example, is there a way to:

Dynamically pass tenant-specific database postgres credentials/configurations to the LightRAG or other components for each operation? OR use a single database with workspace isolations that can easily be modified during run time...

Any guidance or best practices you could share on this would be greatly appreciated. Thank you for the fantastic library!

Additional Context

No response

harish-trench avatar Sep 22 '25 11:09 harish-trench

+1 on this. Would love to know the recommended approach here.

BEEFF avatar Sep 22 '25 13:09 BEEFF

We've encountered the same challenge. In the method await ClientManager.get_client()

    @classmethod
    async def get_client(cls) -> PostgreSQLDB:
        async with cls._lock:
            if cls._instances["db"] is None:
                config = ClientManager.get_config()
                db = PostgreSQLDB(config)
                await db.initdb()
                await db.check_tables()
                cls._instances["db"] = db
                cls._instances["ref_count"] = 0
            cls._instances["ref_count"] += 1
            return cls._instances["db"]

This is a global parameter, it will always return the same workspace each time selecting from the DB.

In the postgres_impl.py, suggest the workspace parameter to accept the programmatically passed-in function argument instead of taking the class attribute (which will never be changed once initialized).

This will help a lot if using the LightRAG core for multi-tenant solution.

I have no idea why the DB workspace takes higher priority:

    async def initialize(self):
        if self.db is None:
            self.db = await ClientManager.get_client()
            # Implement workspace priority: PostgreSQLDB.workspace > self.workspace > "default"
            if self.db.workspace:
                # Use PostgreSQLDB's workspace (highest priority)
                final_workspace = self.db.workspace
            elif hasattr(self, "workspace") and self.workspace:
                # Use storage class's workspace (medium priority)
                final_workspace = self.workspace
                self.db.workspace = final_workspace
            else:
                # Use "default" for compatibility (lowest priority)
                final_workspace = "default"
                self.db.workspace = final_workspace

And the DB object is singleton instance

superfive666 avatar Sep 23 '25 08:09 superfive666

+1

superuely avatar Sep 23 '25 21:09 superuely

I have managed to solve the issue by overriding the relevant storage classes 👍

superfive666 avatar Sep 24 '25 09:09 superfive666

I have managed to solve the issue by overriding the relevant storage classes 👍

Thats's Great! When i tried overriding, i noticed all the storage classes come with a @final decorator which makes it impossible to override unless you patch or change the library code itself. Am I missing something here or can you explain how you managed to override?

harish-trench avatar Sep 24 '25 11:09 harish-trench

I have managed to solve the issue by overriding the relevant storage classes 👍

Thats's Great! When i tried overriding, i noticed all the storage classes come with a @Final decorator which makes it impossible to override unless you patch or change the library code itself. Am I missing something here or can you explain how you managed to override?

Sorry for the wrong use of term "override", you can create your own storage classes based on the PG imlementation, and replace those internal storage classes with your own in the light rag instance.

superfive666 avatar Sep 26 '25 01:09 superfive666

Ahh, I see! I was able to tinker with the ClientManager class to fix this. But, however this seems to be an issue with the library that has some easy workarounds.

  1. Unless there is a special reason, the database credentials should not be read from the env variables.
  2. ClientManager or the storage classes must be able to take workspace as params as the db instance is instantiated, ensuring there are no race conditions.

harish-trench avatar Sep 26 '25 09:09 harish-trench

+1

I'm looking at various RAG solutions right now, and the choice to make you have to run multiple server instances in order to separate workspaces is a very strange design decision to me. Having to setup a new instance every time someone wants to make a new workspace isn't viable when you want users to be able to create/delete an arbitrary number of isolated workspaces.

I'd much prefer something like ChromaDB's solution of just having a "collection" name that you can pass as a parameter to the API, and all collections are isolated from each other without having to run a new server instance for every new workspace.

Mobious avatar Oct 11 '25 22:10 Mobious

But isn't that Chromadb's storage is only for the vector embeddings? Btw, You are absolutely right. Having separate instances for each tenant is not at all scalable. Having one instance with some patching applied in the ClientManager class will enable you to use the workspace parameter efficiently to isolate data. But since the data i am dealing with is extremely critical and i cannot afford to have any mishaps with mixing of data, i wanted to give it some time to see the performance and results to maintain a single instance with proper workspace isolation.

harish-trench avatar Oct 13 '25 03:10 harish-trench

hi, I also encountered this problem, how to solve this problem, can you share the code :)

loliw avatar Oct 14 '25 08:10 loliw

Hope you can provide a code example, or submit a PR

huangbhan avatar Nov 13 '25 12:11 huangbhan

I found the mention of working_dir mentioned here https://github.com/HKUDS/LightRAG/issues/310 but I don't find much about its caveat or what it actually does

xtfocus avatar Nov 24 '25 08:11 xtfocus