LightRAG icon indicating copy to clipboard operation
LightRAG copied to clipboard

Feat: Add Workspace Isolation for Pipeline Status and In-memory Storage

Open danielaskdd opened this issue 3 weeks ago โ€ข 6 comments

Feat: Add Workspace Isolation for Pipeline Status and In-memory Storage

๐ŸŽฏ Problem Statement

When multiple LightRAG objects with different workspace values are instantiated simultaneously, the following issues occur:

  1. Pipeline Status Sharing Conflicts: All workspaces share a single pipeline_status, causing pipeline states from different workspaces to interfere with each other
  2. Lock Mechanism Deficiency: Existing locks (_pipeline_status_lock, _graph_db_lock, _storage_lock) are not workspace-isolated, causing operations from different workspaces to block each other unnecessarily
  3. In Memory Json KV Storage Lack of Workspace Isolation: Related namespace functions don't provide workspace parameters, preventing true workspace isolation

โœจ Solution

1. Workspace Isolation for Pipeline Status

  • Treat pipeline_status as a special namespace (storage type), similar to KV storage but without persistence
  • Create independent pipeline_status namespace for each workspace
  • Namespace format: <workspace>:pipeline_status

2. Unified Workspace-Based Lock Mechanism

  • Remove legacy global locks: _pipeline_status_lock, _graph_db_lock, _storage_lock
  • Introduce unified keyed lock mechanism: implemented via _storage_keyed_lock
  • Lock namespace: <workspace>:<storage_type>
  • Lock key: Fixed as default_key
  • Benefits: Fine-grained workspace-level isolation, avoiding cross-workspace lock contention

3. New get_namespace_lock() Function

def get_namespace_lock(
    namespace: str, 
    workspace: str | None = None, 
    enable_logging: bool = False
) -> NamespaceLock
  • Simplifies namespace-level lock acquisition
  • Automatically handles workspace and namespace combination
  • Unified lock interface, replacing multiple independent locks

4. Add Workspace Parameter to All Namespace Operations

Updated function signatures to support workspace parameter:

  • initialize_pipeline_status(workspace: str | None = None)
  • get_namespace_data(namespace: str, first_init: bool = False, workspace: str | None = None)
  • get_update_flag(namespace: str, workspace: str | None = None)
  • set_all_update_flags(namespace: str, workspace: str | None = None)
  • clear_all_update_flags(namespace: str, workspace: str | None = None)
  • get_all_update_flags_status(workspace: str | None = None)
  • try_initialize_namespace(namespace: str, workspace: str | None = None)

5. Default Workspace Support (Backward Compatibility)

  • Added global variable _default_workspace
  • Added function set_default_workspace(workspace: str | None = None)
  • Added function get_default_workspace() -> str
  • Purpose: Maintain compatibility with legacy code that doesn't provide workspace parameter
  • Behavior: Automatically use default workspace when workspace parameter is None

6. Unified Namespace Naming Convention

Added get_final_namespace() function:

def get_final_namespace(namespace: str, workspace: str | None = None) -> str
  • Centralized logic for combining workspace and namespace
  • Format: <workspace>:<namespace> or <namespace> (when workspace is empty)
  • Ensures consistent naming across all namespace operations

7. Standardize empty workspace handling from "_" to "" across storage

  • Unify empty workspace behavior by changing workspace from "_" to ""
  • Fixed incorrect empty workspace detection in get_all_update_flags_status()

8. Auto-initialize pipeline status in initialize_storages()

  • Remove manual initialize_pipeline_status calls
  • Auto-init in initialize_storages method
  • Update error and warning messages and for clarity
  • Remove manual initialize_pipeline_status() calls across codebase
  • Update docs and examples

๐Ÿ“ Key Modified Files

  • lightrag/kg/shared_storage.py: Core modification file

    • Added workspace isolation logic
    • Implemented get_namespace_lock()
    • Implemented get_final_namespace()
    • Added default workspace support
    • Added workspace parameter to all namespace operation functions
  • Storage Implementation Files (using new lock mechanism):

    • lightrag/kg/json_kv_impl.py
    • lightrag/kg/json_doc_status_impl.py
    • lightrag/kg/nano_vector_db_impl.py
    • lightrag/kg/faiss_impl.py
    • lightrag/kg/networkx_impl.py
    • All storage implementations now use get_namespace_lock() instead of legacy locks
  • API and Core Logic Files:

    • lightrag/lightrag.py: Set default workspace
    • lightrag/api/lightrag_server.py: Pipeline status initialization
    • lightrag/api/routers/document_routes.py: Use new namespace lock interface

๐Ÿงช Testing Recommendations

  1. Multi-Workspace Concurrency Test: Create multiple LightRAG instances with different workspaces simultaneously, verify no interference
  2. Pipeline Status Isolation Test: Verify pipeline status for different workspaces runs independently
  3. Backward Compatibility Test: Verify legacy code without workspace specification still works correctly
  4. Lock Mechanism Test: Verify new keyed lock mechanism works correctly without deadlocks

๐ŸŽ‰ Expected Outcomes

  • โœ… Complete workspace-level isolation
  • โœ… LightRAG instances with different workspaces can run concurrently without interference
  • โœ… Pipeline status no longer interferes across workspaces
  • โœ… Optimized lock granularity, reduced unnecessary lock contention
  • โœ… 100% backward compatible with existing code

danielaskdd avatar Nov 17 '25 05:11 danielaskdd

@codex review

danielaskdd avatar Nov 17 '25 05:11 danielaskdd

@codex review

danielaskdd avatar Nov 17 '25 05:11 danielaskdd

@codex review

danielaskdd avatar Nov 17 '25 06:11 danielaskdd

@codex review

danielaskdd avatar Nov 17 '25 06:11 danielaskdd

@codex review

danielaskdd avatar Nov 17 '25 07:11 danielaskdd

Codex Review: Didn't find any major issues. Delightful!

โ„น๏ธ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with ๐Ÿ‘.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@codex review

danielaskdd avatar Nov 18 '25 03:11 danielaskdd

@codex review

danielaskdd avatar Nov 18 '25 04:11 danielaskdd

@codex review

danielaskdd avatar Nov 18 '25 04:11 danielaskdd

@codex review

danielaskdd avatar Nov 18 '25 05:11 danielaskdd

@codex review

danielaskdd avatar Nov 18 '25 06:11 danielaskdd

Codex Review: Didn't find any major issues. What shall we delve into next?

โ„น๏ธ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with ๐Ÿ‘.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".