LightRAG icon indicating copy to clipboard operation
LightRAG copied to clipboard

Feat: Add Workspace Isolation to Resolve Multi-Instance Concurrency Interference

Open danielaskdd opened this issue 2 weeks ago โ€ข 6 comments

Feat: Add Workspace Isolation to Resolve Multi-Instance Concurrency Interference

๐ŸŽฏ Problem Statement

When multiple LightRAG objects with different workspace values are instantiated simultaneously, the following issues occur:

  1. Pipeline Status Sharing Conflicts: All workspaces share a single pipeline_status, causing pipeline states from different workspaces to interfere with each other
  2. Lock Mechanism Deficiency: Existing locks (_pipeline_status_lock, _graph_db_lock, _storage_lock) are not workspace-isolated, causing operations from different workspaces to block each other unnecessarily
  3. In Memory Json KV Storage Lack of Workspace Isolation: Related namespace functions don't provide workspace parameters, preventing true workspace isolation

โœจ Solution

1. Workspace Isolation for Pipeline Status

  • Treat pipeline_status as a special namespace (storage type), similar to KV storage but without persistence
  • Create independent pipeline_status namespace for each workspace
  • Namespace format: <workspace>:pipeline_status

2. Unified Workspace-Based Lock Mechanism

  • Remove legacy global locks: _pipeline_status_lock, _graph_db_lock, _storage_lock
  • Introduce unified keyed lock mechanism: implemented via _storage_keyed_lock
  • Lock namespace: <workspace>:<storage_type>
  • Lock key: Fixed as default_key
  • Benefits: Fine-grained workspace-level isolation, avoiding cross-workspace lock contention

3. New get_namespace_lock() Function

def get_namespace_lock(
    namespace: str, 
    workspace: str | None = None, 
    enable_logging: bool = False
) -> NamespaceLock
  • Simplifies namespace-level lock acquisition
  • Automatically handles workspace and namespace combination
  • Unified lock interface, replacing multiple independent locks

4. Add Workspace Parameter to All Namespace Operations

Updated function signatures to support workspace parameter:

  • initialize_pipeline_status(workspace: str | None = None)
  • get_namespace_data(namespace: str, first_init: bool = False, workspace: str | None = None)
  • get_update_flag(namespace: str, workspace: str | None = None)
  • set_all_update_flags(namespace: str, workspace: str | None = None)
  • clear_all_update_flags(namespace: str, workspace: str | None = None)
  • get_all_update_flags_status(workspace: str | None = None)
  • try_initialize_namespace(namespace: str, workspace: str | None = None)

5. Default Workspace Support (Backward Compatibility)

  • Added global variable _default_workspace
  • Added function set_default_workspace(workspace: str | None = None)
  • Added function get_default_workspace() -> str
  • Purpose: Maintain compatibility with legacy code that doesn't provide workspace parameter
  • Behavior: Automatically use default workspace when workspace parameter is None

6. Unified Namespace Naming Convention

Added get_final_namespace() function:

def get_final_namespace(namespace: str, workspace: str | None = None) -> str
  • Centralized logic for combining workspace and namespace
  • Format: <workspace>:<namespace> or <namespace> (when workspace is empty)
  • Ensures consistent naming across all namespace operations

7. Standardize empty workspace handling from "_" to "" across storage

  • Unify empty workspace behavior by changing workspace from "_" to ""
  • Fixed incorrect empty workspace detection in get_all_update_flags_status()

8. Auto-initialize pipeline status in initialize_storages()

  • Remove manual initialize_pipeline_status calls
  • Auto-init in initialize_storages method
  • Update error and warning messages and for clarity
  • Remove manual initialize_pipeline_status() calls across codebase
  • Update docs and examples

๐Ÿ“ Key Modified Files

  • lightrag/kg/shared_storage.py: Core modification file

    • Added workspace isolation logic
    • Implemented get_namespace_lock()
    • Implemented get_final_namespace()
    • Added default workspace support
    • Added workspace parameter to all namespace operation functions
  • Storage Implementation Files (using new lock mechanism):

    • lightrag/kg/json_kv_impl.py
    • lightrag/kg/json_doc_status_impl.py
    • lightrag/kg/nano_vector_db_impl.py
    • lightrag/kg/faiss_impl.py
    • lightrag/kg/networkx_impl.py
    • All storage implementations now use get_namespace_lock() instead of legacy locks
  • API and Core Logic Files:

    • lightrag/lightrag.py: Set default workspace
    • lightrag/api/lightrag_server.py: Pipeline status initialization
    • lightrag/api/routers/document_routes.py: Use new namespace lock interface

๐Ÿงช Testing Recommendations

  1. Multi-Workspace Concurrency Test: Create multiple LightRAG instances with different workspaces simultaneously, verify no interference
  2. Pipeline Status Isolation Test: Verify pipeline status for different workspaces runs independently
  3. Backward Compatibility Test: Verify legacy code without workspace specification still works correctly
  4. Lock Mechanism Test: Verify new keyed lock mechanism works correctly without deadlocks

๐ŸŽ‰ Expected Outcomes

  • โœ… Complete workspace-level isolation
  • โœ… LightRAG instances with different workspaces can run concurrently without interference
  • โœ… Pipeline status no longer interferes across workspaces
  • โœ… Optimized lock granularity, reduced unnecessary lock contention
  • โœ… 100% backward compatible with existing code

danielaskdd avatar Nov 16 '25 18:11 danielaskdd

@codex review

danielaskdd avatar Nov 16 '25 18:11 danielaskdd

@codex review

danielaskdd avatar Nov 16 '25 20:11 danielaskdd

@codex review

danielaskdd avatar Nov 16 '25 22:11 danielaskdd

@codex review

danielaskdd avatar Nov 16 '25 22:11 danielaskdd

@codex review

danielaskdd avatar Nov 16 '25 23:11 danielaskdd

Codex Review: Didn't find any major issues. Keep them coming!

โ„น๏ธ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with ๐Ÿ‘.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danielaskdd Hi! I need to fix the commit author attribution before this PR merges.

Issue: Commit 5f153582 uses email [email protected] which is NOT linked to my GitHub account @BukeLy. This means I won't get proper credit for my contribution.

My correct GitHub email: [email protected]

Request: Could you please update the commit author to use my verified email? This is a quick fix:

git checkout dev-workspace-isolation
git rebase -i 5f153582^
# Change 'pick' to 'edit' for commit 5f153582
git commit --amend --author="BukeLy <[email protected]>" --no-edit
git rebase --continue
git push --force

Why this matters:
Without this fix, GitHub won't attribute the commit to my account and I won't appear in Contributors despite doing this work in PR #2353.

I've already updated my own PR branch with the correct email. Thank you for your help! ๐Ÿ™

Alternative (if rebasing is too risky):
Add Co-authored-by line:
Co-authored-by: BukeLy <[email protected]>

BukeLy avatar Nov 17 '25 03:11 BukeLy