LightRAG
LightRAG copied to clipboard
Feat: Add Workspace Isolation to Resolve Multi-Instance Concurrency Interference
Feat: Add Workspace Isolation to Resolve Multi-Instance Concurrency Interference
๐ฏ Problem Statement
When multiple LightRAG objects with different workspace values are instantiated simultaneously, the following issues occur:
- Pipeline Status Sharing Conflicts: All workspaces share a single
pipeline_status, causing pipeline states from different workspaces to interfere with each other - Lock Mechanism Deficiency: Existing locks (
_pipeline_status_lock,_graph_db_lock,_storage_lock) are not workspace-isolated, causing operations from different workspaces to block each other unnecessarily - In Memory Json KV Storage Lack of Workspace Isolation: Related namespace functions don't provide workspace parameters, preventing true workspace isolation
โจ Solution
1. Workspace Isolation for Pipeline Status
- Treat
pipeline_statusas a special namespace (storage type), similar to KV storage but without persistence - Create independent pipeline_status namespace for each workspace
- Namespace format:
<workspace>:pipeline_status
2. Unified Workspace-Based Lock Mechanism
- Remove legacy global locks:
_pipeline_status_lock,_graph_db_lock,_storage_lock - Introduce unified keyed lock mechanism: implemented via
_storage_keyed_lock - Lock namespace:
<workspace>:<storage_type> - Lock key: Fixed as
default_key - Benefits: Fine-grained workspace-level isolation, avoiding cross-workspace lock contention
3. New get_namespace_lock() Function
def get_namespace_lock(
namespace: str,
workspace: str | None = None,
enable_logging: bool = False
) -> NamespaceLock
- Simplifies namespace-level lock acquisition
- Automatically handles workspace and namespace combination
- Unified lock interface, replacing multiple independent locks
4. Add Workspace Parameter to All Namespace Operations
Updated function signatures to support workspace parameter:
initialize_pipeline_status(workspace: str | None = None)get_namespace_data(namespace: str, first_init: bool = False, workspace: str | None = None)get_update_flag(namespace: str, workspace: str | None = None)set_all_update_flags(namespace: str, workspace: str | None = None)clear_all_update_flags(namespace: str, workspace: str | None = None)get_all_update_flags_status(workspace: str | None = None)try_initialize_namespace(namespace: str, workspace: str | None = None)
5. Default Workspace Support (Backward Compatibility)
- Added global variable
_default_workspace - Added function
set_default_workspace(workspace: str | None = None) - Added function
get_default_workspace() -> str - Purpose: Maintain compatibility with legacy code that doesn't provide workspace parameter
- Behavior: Automatically use default workspace when workspace parameter is None
6. Unified Namespace Naming Convention
Added get_final_namespace() function:
def get_final_namespace(namespace: str, workspace: str | None = None) -> str
- Centralized logic for combining workspace and namespace
- Format:
<workspace>:<namespace>or<namespace>(when workspace is empty) - Ensures consistent naming across all namespace operations
7. Standardize empty workspace handling from "_" to "" across storage
- Unify empty workspace behavior by changing workspace from "_" to ""
- Fixed incorrect empty workspace detection in get_all_update_flags_status()
8. Auto-initialize pipeline status in initialize_storages()
- Remove manual initialize_pipeline_status calls
- Auto-init in initialize_storages method
- Update error and warning messages and for clarity
- Remove manual initialize_pipeline_status() calls across codebase
- Update docs and examples
๐ Key Modified Files
-
lightrag/kg/shared_storage.py: Core modification file- Added workspace isolation logic
- Implemented
get_namespace_lock() - Implemented
get_final_namespace() - Added default workspace support
- Added workspace parameter to all namespace operation functions
-
Storage Implementation Files (using new lock mechanism):
lightrag/kg/json_kv_impl.pylightrag/kg/json_doc_status_impl.pylightrag/kg/nano_vector_db_impl.pylightrag/kg/faiss_impl.pylightrag/kg/networkx_impl.py- All storage implementations now use
get_namespace_lock()instead of legacy locks
-
API and Core Logic Files:
lightrag/lightrag.py: Set default workspacelightrag/api/lightrag_server.py: Pipeline status initializationlightrag/api/routers/document_routes.py: Use new namespace lock interface
๐งช Testing Recommendations
- Multi-Workspace Concurrency Test: Create multiple LightRAG instances with different workspaces simultaneously, verify no interference
- Pipeline Status Isolation Test: Verify pipeline status for different workspaces runs independently
- Backward Compatibility Test: Verify legacy code without workspace specification still works correctly
- Lock Mechanism Test: Verify new keyed lock mechanism works correctly without deadlocks
๐ Expected Outcomes
- โ Complete workspace-level isolation
- โ LightRAG instances with different workspaces can run concurrently without interference
- โ Pipeline status no longer interferes across workspaces
- โ Optimized lock granularity, reduced unnecessary lock contention
- โ 100% backward compatible with existing code
@codex review
@codex review
@codex review
@codex review
@codex review
Codex Review: Didn't find any major issues. Keep them coming!
โน๏ธ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with ๐.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
@danielaskdd Hi! I need to fix the commit author attribution before this PR merges.
Issue:
Commit 5f153582 uses email [email protected] which is NOT linked to my GitHub account @BukeLy. This means I won't get proper credit for my contribution.
My correct GitHub email: [email protected]
Request: Could you please update the commit author to use my verified email? This is a quick fix:
git checkout dev-workspace-isolation
git rebase -i 5f153582^
# Change 'pick' to 'edit' for commit 5f153582
git commit --amend --author="BukeLy <[email protected]>" --no-edit
git rebase --continue
git push --force
Why this matters:
Without this fix, GitHub won't attribute the commit to my account and I won't appear in Contributors despite doing this work in PR #2353.
I've already updated my own PR branch with the correct email. Thank you for your help! ๐
Alternative (if rebasing is too risky):
Add Co-authored-by line:
Co-authored-by: BukeLy <[email protected]>