LightRAG
LightRAG copied to clipboard
Feat: Add Workspace Isolation for Pipeline Status and In-memory Storage
Feat: Add Workspace Isolation for Pipeline Status and In-memory Storage
๐ฏ Problem Statement
When multiple LightRAG objects with different workspace values are instantiated simultaneously, the following issues occur:
- Pipeline Status Sharing Conflicts: All workspaces share a single
pipeline_status, causing pipeline states from different workspaces to interfere with each other - Lock Mechanism Deficiency: Existing locks (
_pipeline_status_lock,_graph_db_lock,_storage_lock) are not workspace-isolated, causing operations from different workspaces to block each other unnecessarily - In Memory Json KV Storage Lack of Workspace Isolation: Related namespace functions don't provide workspace parameters, preventing true workspace isolation
โจ Solution
1. Workspace Isolation for Pipeline Status
- Treat
pipeline_statusas a special namespace (storage type), similar to KV storage but without persistence - Create independent pipeline_status namespace for each workspace
- Namespace format:
<workspace>:pipeline_status
2. Unified Workspace-Based Lock Mechanism
- Remove legacy global locks:
_pipeline_status_lock,_graph_db_lock,_storage_lock - Introduce unified keyed lock mechanism: implemented via
_storage_keyed_lock - Lock namespace:
<workspace>:<storage_type> - Lock key: Fixed as
default_key - Benefits: Fine-grained workspace-level isolation, avoiding cross-workspace lock contention
3. New get_namespace_lock() Function
def get_namespace_lock(
namespace: str,
workspace: str | None = None,
enable_logging: bool = False
) -> NamespaceLock
- Simplifies namespace-level lock acquisition
- Automatically handles workspace and namespace combination
- Unified lock interface, replacing multiple independent locks
4. Add Workspace Parameter to All Namespace Operations
Updated function signatures to support workspace parameter:
initialize_pipeline_status(workspace: str | None = None)get_namespace_data(namespace: str, first_init: bool = False, workspace: str | None = None)get_update_flag(namespace: str, workspace: str | None = None)set_all_update_flags(namespace: str, workspace: str | None = None)clear_all_update_flags(namespace: str, workspace: str | None = None)get_all_update_flags_status(workspace: str | None = None)try_initialize_namespace(namespace: str, workspace: str | None = None)
5. Default Workspace Support (Backward Compatibility)
- Added global variable
_default_workspace - Added function
set_default_workspace(workspace: str | None = None) - Added function
get_default_workspace() -> str - Purpose: Maintain compatibility with legacy code that doesn't provide workspace parameter
- Behavior: Automatically use default workspace when workspace parameter is None
6. Unified Namespace Naming Convention
Added get_final_namespace() function:
def get_final_namespace(namespace: str, workspace: str | None = None) -> str
- Centralized logic for combining workspace and namespace
- Format:
<workspace>:<namespace>or<namespace>(when workspace is empty) - Ensures consistent naming across all namespace operations
7. Standardize empty workspace handling from "_" to "" across storage
- Unify empty workspace behavior by changing workspace from "_" to ""
- Fixed incorrect empty workspace detection in get_all_update_flags_status()
8. Auto-initialize pipeline status in initialize_storages()
- Remove manual initialize_pipeline_status calls
- Auto-init in initialize_storages method
- Update error and warning messages and for clarity
- Remove manual initialize_pipeline_status() calls across codebase
- Update docs and examples
๐ Key Modified Files
-
lightrag/kg/shared_storage.py: Core modification file- Added workspace isolation logic
- Implemented
get_namespace_lock() - Implemented
get_final_namespace() - Added default workspace support
- Added workspace parameter to all namespace operation functions
-
Storage Implementation Files (using new lock mechanism):
lightrag/kg/json_kv_impl.pylightrag/kg/json_doc_status_impl.pylightrag/kg/nano_vector_db_impl.pylightrag/kg/faiss_impl.pylightrag/kg/networkx_impl.py- All storage implementations now use
get_namespace_lock()instead of legacy locks
-
API and Core Logic Files:
lightrag/lightrag.py: Set default workspacelightrag/api/lightrag_server.py: Pipeline status initializationlightrag/api/routers/document_routes.py: Use new namespace lock interface
๐งช Testing Recommendations
- Multi-Workspace Concurrency Test: Create multiple LightRAG instances with different workspaces simultaneously, verify no interference
- Pipeline Status Isolation Test: Verify pipeline status for different workspaces runs independently
- Backward Compatibility Test: Verify legacy code without workspace specification still works correctly
- Lock Mechanism Test: Verify new keyed lock mechanism works correctly without deadlocks
๐ Expected Outcomes
- โ Complete workspace-level isolation
- โ LightRAG instances with different workspaces can run concurrently without interference
- โ Pipeline status no longer interferes across workspaces
- โ Optimized lock granularity, reduced unnecessary lock contention
- โ 100% backward compatible with existing code
@codex review
@codex review
@codex review
@codex review
@codex review
Codex Review: Didn't find any major issues. Delightful!
โน๏ธ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with ๐.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
@codex review
@codex review
@codex review
@codex review
@codex review
Codex Review: Didn't find any major issues. What shall we delve into next?
โน๏ธ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with ๐.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".