Thread Safety: Context Management for Server

Problem

gptme currently uses global state for configuration and tools, which causes issues in the server where multiple conversations run concurrently. We need proper thread isolation, especially after PR #500 introduces ChatConfig.

Key issues:

Configuration is global but needs to be per-conversation
Tools state is not properly isolated between threads
Risk of cross-talk between conversations in server

Background

PR #500 introduces a layered config system:

Config (user level)
  ↓
ProjectConfig (project specific)
  ↓
ChatConfig (conversation specific)

This provides a good foundation for per-conversation config, but we need thread-safety for the server.

Proposal

Implement thread-local storage for ChatConfig and related state, while keeping Config/ProjectConfig as global (they're effectively read-only).

Design Goals

Minimal and focused on critical thread-safety needs
Build on top of PR #500's config system
Keep global config for read-only state
Easy to test and verify thread isolation

Implementation Details

Thread-local storage for:

ChatConfig - Per-conversation settings
Tool state - Loaded tools and their state
Other conversation-specific state (model, workspace, etc)

Example usage in server:

# Set thread-local config for request
with ChatContext(chat_config) as ctx:
    # Handle request
    # All operations use thread-local config
    ...

Tasks

Initial setup:

[ ] Wait for PR #500 to be merged
[ ] Create new branch for thread-safety work

Implementation:

[ ] Create ChatContext class for managing thread-local state
[ ] Add thread-local storage for ChatConfig
[ ] Add context manager interface
[ ] Modify server to use ChatContext
[ ] Add tool state isolation
[ ] Update relevant tests for thread safety

Testing:

[ ] Add concurrent request tests
[ ] Test tool isolation
[ ] Test config isolation
[ ] Add stress tests for server

Documentation:

[ ] Document thread safety considerations
[ ] Add examples for server usage
[ ] Update architecture docs

Testing Strategy

Unit Tests:

[ ] Test ChatContext in isolation
[ ] Verify proper cleanup of thread-local storage
[ ] Test concurrent access patterns

Integration Tests:

[ ] Test server with multiple concurrent requests
[ ] Verify no cross-talk between conversations
[ ] Test tool execution isolation

Stress Tests:

[ ] High concurrency tests
[ ] Memory leak detection
[ ] Resource cleanup verification

Future Considerations

Potential Expansions:

Support for async/await
More granular tool isolation
Custom context factories

Performance Optimization:

Connection pooling
Resource sharing where safe
Cleanup optimization

References

PR #500: Config Refactor
Issue #XXX: Server Stability
Python threading docs

Notes

Keep changes minimal and focused
Maintain backward compatibility
Consider adding monitoring/debugging tools
Document any limitations or edge cases

Apr 14 '25 20:04 TimeToBuildBob

This issue was generated with gptme, after discussing the design of thread-local context management for the server.

Apr 14 '25 20:04 TimeToBuildBob

Alternative Approach: Pure Functional Config Passing

Instead of using thread-locals/context managers, we could pass config explicitly through the call chain. This would be a more fundamental architectural shift towards a more purely functional style.

Benefits

More purely functional approach
- No hidden global state
- Pure functions are easier to test and reason about
- Dependencies are explicit
Thread safety "for free"
- No need for thread-local storage
- No risk of cross-talk between threads
- Works naturally with async/await
Better testability
- Easy to inject test configs
- No need to mock global state
- Clear what state each function needs

Example: Config Evolution

Config gets "upgraded" as it passes through initialization:

# Base config (from ~/.config/gptme/config.toml)
config = Config(...)

# Add project-specific config
project_config = ProjectConfig.from_config(config, workspace)

# Create chat-specific config
chat_config = ChatConfig.from_project(
    project_config,
    model="anthropic/claude-3",
    tools=["shell", "python"],
)

Key Components to Modify

Entry Points:

# Before
def chat(prompts: list[Message], ...):
    config = get_config()  # global
    ...

# After
def chat(config: ChatConfig, prompts: list[Message], ...):
    ...

Tool Execution:

# Before
def execute_msg(msg: Message, confirm: ConfirmFunc):
    tools = get_tools()  # global
    ...

# After
def execute_msg(config: ChatConfig, msg: Message, confirm: ConfirmFunc):
    ...

Server Handlers:

# Before
@app.route("/api/v2/chat")
def chat_handler():
    config = get_config()  # global
    ...

# After
@app.route("/api/v2/chat")
def chat_handler():
    config = create_request_config()  # creates fresh config for request
    ...

Migration Strategy

Start with core components:
- [ ] Add config parameter to key functions
- [ ] Keep global fallback during transition
- [ ] Update tests to use explicit config
Modify tools system:
- [ ] Make tools take config explicitly
- [ ] Remove global tool state
- [ ] Update tool tests
Update server:
- [ ] Create per-request configs
- [ ] Remove thread-local state
- [ ] Add request isolation tests
Clean up:
- [ ] Remove global config access
- [ ] Remove thread-local code
- [ ] Update documentation

Comparison to Thread-Local Approach

Thread-Local:

(+) Less refactoring needed
(+) Easier backward compatibility
(-) Hidden state
(-) Need careful thread management
(-) More complex testing

Pure Functional:

(+) Cleaner architecture
(+) Better testing
(+) Natural thread safety
(-) More refactoring needed
(-) Need to pass config everywhere

Decision Points

Backward Compatibility
- How to handle existing code during transition?
- Keep globals as fallback temporarily?
Config Lifecycle
- When/where to create different config levels?
- How to handle config updates?
Performance Impact
- Cost of passing config vs thread-local
- Memory usage patterns
- Garbage collection behavior

Recommendation

The pure functional approach would be a bigger change but would result in a cleaner, more maintainable codebase. The main tradeoff is the amount of refactoring needed vs the benefits of explicit state passing.

Suggested next steps:

Start with a small core component
Prove the approach works
Gradually expand to more components
Keep backward compatibility during transition

This would be a more ambitious change than the thread-local approach, but would result in a more robust and maintainable system in the long run.

Apr 14 '25 20:04 TimeToBuildBob

Got rid of the use of get_tools in prompt-generation in https://github.com/gptme/gptme/pull/503, now passing down tools: list[ToolSpec].

Apr 16 '25 11:04 ErikBjare

Made the config and loaded tools globals into thread-locals in https://github.com/gptme/gptme/pull/506

Attempted switching from thread-locals to using context vars in https://github.com/gptme/gptme/pull/507 (mostly to improve typing, but some tests are failing...)

Apr 18 '25 14:04 ErikBjare

This issue is largely solved. Some rough edges remain, but far from as many as before.

Jun 09 '25 13:06 ErikBjare

gptme gptme copied to clipboard