gptme
gptme copied to clipboard
Thread Safety: Context Management for Server
Thread Safety: Context Management for Server
Problem
gptme currently uses global state for configuration and tools, which causes issues in the server where multiple conversations run concurrently. We need proper thread isolation, especially after PR #500 introduces ChatConfig.
Key issues:
- Configuration is global but needs to be per-conversation
- Tools state is not properly isolated between threads
- Risk of cross-talk between conversations in server
Background
PR #500 introduces a layered config system:
Config (user level)
↓
ProjectConfig (project specific)
↓
ChatConfig (conversation specific)
This provides a good foundation for per-conversation config, but we need thread-safety for the server.
Proposal
Implement thread-local storage for ChatConfig and related state, while keeping Config/ProjectConfig as global (they're effectively read-only).
Design Goals
- Minimal and focused on critical thread-safety needs
- Build on top of PR #500's config system
- Keep global config for read-only state
- Easy to test and verify thread isolation
Implementation Details
Thread-local storage for:
- ChatConfig - Per-conversation settings
- Tool state - Loaded tools and their state
- Other conversation-specific state (model, workspace, etc)
Example usage in server:
# Set thread-local config for request
with ChatContext(chat_config) as ctx:
# Handle request
# All operations use thread-local config
...
Tasks
Initial setup:
- [ ] Wait for PR #500 to be merged
- [ ] Create new branch for thread-safety work
Implementation:
- [ ] Create ChatContext class for managing thread-local state
- [ ] Add thread-local storage for ChatConfig
- [ ] Add context manager interface
- [ ] Modify server to use ChatContext
- [ ] Add tool state isolation
- [ ] Update relevant tests for thread safety
Testing:
- [ ] Add concurrent request tests
- [ ] Test tool isolation
- [ ] Test config isolation
- [ ] Add stress tests for server
Documentation:
- [ ] Document thread safety considerations
- [ ] Add examples for server usage
- [ ] Update architecture docs
Testing Strategy
- Unit Tests:
- [ ] Test ChatContext in isolation
- [ ] Verify proper cleanup of thread-local storage
- [ ] Test concurrent access patterns
- Integration Tests:
- [ ] Test server with multiple concurrent requests
- [ ] Verify no cross-talk between conversations
- [ ] Test tool execution isolation
- Stress Tests:
- [ ] High concurrency tests
- [ ] Memory leak detection
- [ ] Resource cleanup verification
Future Considerations
- Potential Expansions:
- Support for async/await
- More granular tool isolation
- Custom context factories
- Performance Optimization:
- Connection pooling
- Resource sharing where safe
- Cleanup optimization
References
- PR #500: Config Refactor
- Issue #XXX: Server Stability
- Python threading docs
Notes
- Keep changes minimal and focused
- Maintain backward compatibility
- Consider adding monitoring/debugging tools
- Document any limitations or edge cases
This issue was generated with gptme, after discussing the design of thread-local context management for the server.
Alternative Approach: Pure Functional Config Passing
Instead of using thread-locals/context managers, we could pass config explicitly through the call chain. This would be a more fundamental architectural shift towards a more purely functional style.
Benefits
-
More purely functional approach
- No hidden global state
- Pure functions are easier to test and reason about
- Dependencies are explicit
-
Thread safety "for free"
- No need for thread-local storage
- No risk of cross-talk between threads
- Works naturally with async/await
-
Better testability
- Easy to inject test configs
- No need to mock global state
- Clear what state each function needs
Example: Config Evolution
Config gets "upgraded" as it passes through initialization:
# Base config (from ~/.config/gptme/config.toml)
config = Config(...)
# Add project-specific config
project_config = ProjectConfig.from_config(config, workspace)
# Create chat-specific config
chat_config = ChatConfig.from_project(
project_config,
model="anthropic/claude-3",
tools=["shell", "python"],
)
Key Components to Modify
- Entry Points:
# Before
def chat(prompts: list[Message], ...):
config = get_config() # global
...
# After
def chat(config: ChatConfig, prompts: list[Message], ...):
...
- Tool Execution:
# Before
def execute_msg(msg: Message, confirm: ConfirmFunc):
tools = get_tools() # global
...
# After
def execute_msg(config: ChatConfig, msg: Message, confirm: ConfirmFunc):
...
- Server Handlers:
# Before
@app.route("/api/v2/chat")
def chat_handler():
config = get_config() # global
...
# After
@app.route("/api/v2/chat")
def chat_handler():
config = create_request_config() # creates fresh config for request
...
Migration Strategy
-
Start with core components:
- [ ] Add config parameter to key functions
- [ ] Keep global fallback during transition
- [ ] Update tests to use explicit config
-
Modify tools system:
- [ ] Make tools take config explicitly
- [ ] Remove global tool state
- [ ] Update tool tests
-
Update server:
- [ ] Create per-request configs
- [ ] Remove thread-local state
- [ ] Add request isolation tests
-
Clean up:
- [ ] Remove global config access
- [ ] Remove thread-local code
- [ ] Update documentation
Comparison to Thread-Local Approach
Thread-Local:
- (+) Less refactoring needed
- (+) Easier backward compatibility
- (-) Hidden state
- (-) Need careful thread management
- (-) More complex testing
Pure Functional:
- (+) Cleaner architecture
- (+) Better testing
- (+) Natural thread safety
- (-) More refactoring needed
- (-) Need to pass config everywhere
Decision Points
-
Backward Compatibility
- How to handle existing code during transition?
- Keep globals as fallback temporarily?
-
Config Lifecycle
- When/where to create different config levels?
- How to handle config updates?
-
Performance Impact
- Cost of passing config vs thread-local
- Memory usage patterns
- Garbage collection behavior
Recommendation
The pure functional approach would be a bigger change but would result in a cleaner, more maintainable codebase. The main tradeoff is the amount of refactoring needed vs the benefits of explicit state passing.
Suggested next steps:
- Start with a small core component
- Prove the approach works
- Gradually expand to more components
- Keep backward compatibility during transition
This would be a more ambitious change than the thread-local approach, but would result in a more robust and maintainable system in the long run.
Got rid of the use of get_tools in prompt-generation in https://github.com/gptme/gptme/pull/503, now passing down tools: list[ToolSpec].
Made the config and loaded tools globals into thread-locals in https://github.com/gptme/gptme/pull/506
Attempted switching from thread-locals to using context vars in https://github.com/gptme/gptme/pull/507 (mostly to improve typing, but some tests are failing...)
This issue is largely solved. Some rough edges remain, but far from as many as before.