gptme icon indicating copy to clipboard operation
gptme copied to clipboard

Thread Safety: Context Management for Server

Open TimeToBuildBob opened this issue 7 months ago • 4 comments

Thread Safety: Context Management for Server

Problem

gptme currently uses global state for configuration and tools, which causes issues in the server where multiple conversations run concurrently. We need proper thread isolation, especially after PR #500 introduces ChatConfig.

Key issues:

  • Configuration is global but needs to be per-conversation
  • Tools state is not properly isolated between threads
  • Risk of cross-talk between conversations in server

Background

PR #500 introduces a layered config system:

Config (user level)
  ↓
ProjectConfig (project specific)
  ↓
ChatConfig (conversation specific)

This provides a good foundation for per-conversation config, but we need thread-safety for the server.

Proposal

Implement thread-local storage for ChatConfig and related state, while keeping Config/ProjectConfig as global (they're effectively read-only).

Design Goals

  • Minimal and focused on critical thread-safety needs
  • Build on top of PR #500's config system
  • Keep global config for read-only state
  • Easy to test and verify thread isolation

Implementation Details

Thread-local storage for:

  1. ChatConfig - Per-conversation settings
  2. Tool state - Loaded tools and their state
  3. Other conversation-specific state (model, workspace, etc)

Example usage in server:

# Set thread-local config for request
with ChatContext(chat_config) as ctx:
    # Handle request
    # All operations use thread-local config
    ...

Tasks

Initial setup:

  • [ ] Wait for PR #500 to be merged
  • [ ] Create new branch for thread-safety work

Implementation:

  • [ ] Create ChatContext class for managing thread-local state
  • [ ] Add thread-local storage for ChatConfig
  • [ ] Add context manager interface
  • [ ] Modify server to use ChatContext
  • [ ] Add tool state isolation
  • [ ] Update relevant tests for thread safety

Testing:

  • [ ] Add concurrent request tests
  • [ ] Test tool isolation
  • [ ] Test config isolation
  • [ ] Add stress tests for server

Documentation:

  • [ ] Document thread safety considerations
  • [ ] Add examples for server usage
  • [ ] Update architecture docs

Testing Strategy

  1. Unit Tests:
  • [ ] Test ChatContext in isolation
  • [ ] Verify proper cleanup of thread-local storage
  • [ ] Test concurrent access patterns
  1. Integration Tests:
  • [ ] Test server with multiple concurrent requests
  • [ ] Verify no cross-talk between conversations
  • [ ] Test tool execution isolation
  1. Stress Tests:
  • [ ] High concurrency tests
  • [ ] Memory leak detection
  • [ ] Resource cleanup verification

Future Considerations

  1. Potential Expansions:
  • Support for async/await
  • More granular tool isolation
  • Custom context factories
  1. Performance Optimization:
  • Connection pooling
  • Resource sharing where safe
  • Cleanup optimization

References

Notes

  • Keep changes minimal and focused
  • Maintain backward compatibility
  • Consider adding monitoring/debugging tools
  • Document any limitations or edge cases

TimeToBuildBob avatar Apr 14 '25 20:04 TimeToBuildBob

This issue was generated with gptme, after discussing the design of thread-local context management for the server.

TimeToBuildBob avatar Apr 14 '25 20:04 TimeToBuildBob

Alternative Approach: Pure Functional Config Passing

Instead of using thread-locals/context managers, we could pass config explicitly through the call chain. This would be a more fundamental architectural shift towards a more purely functional style.

Benefits

  1. More purely functional approach

    • No hidden global state
    • Pure functions are easier to test and reason about
    • Dependencies are explicit
  2. Thread safety "for free"

    • No need for thread-local storage
    • No risk of cross-talk between threads
    • Works naturally with async/await
  3. Better testability

    • Easy to inject test configs
    • No need to mock global state
    • Clear what state each function needs

Example: Config Evolution

Config gets "upgraded" as it passes through initialization:

# Base config (from ~/.config/gptme/config.toml)
config = Config(...)

# Add project-specific config
project_config = ProjectConfig.from_config(config, workspace)

# Create chat-specific config
chat_config = ChatConfig.from_project(
    project_config,
    model="anthropic/claude-3",
    tools=["shell", "python"],
)

Key Components to Modify

  1. Entry Points:
# Before
def chat(prompts: list[Message], ...):
    config = get_config()  # global
    ...

# After
def chat(config: ChatConfig, prompts: list[Message], ...):
    ...
  1. Tool Execution:
# Before
def execute_msg(msg: Message, confirm: ConfirmFunc):
    tools = get_tools()  # global
    ...

# After
def execute_msg(config: ChatConfig, msg: Message, confirm: ConfirmFunc):
    ...
  1. Server Handlers:
# Before
@app.route("/api/v2/chat")
def chat_handler():
    config = get_config()  # global
    ...

# After
@app.route("/api/v2/chat")
def chat_handler():
    config = create_request_config()  # creates fresh config for request
    ...

Migration Strategy

  1. Start with core components:

    • [ ] Add config parameter to key functions
    • [ ] Keep global fallback during transition
    • [ ] Update tests to use explicit config
  2. Modify tools system:

    • [ ] Make tools take config explicitly
    • [ ] Remove global tool state
    • [ ] Update tool tests
  3. Update server:

    • [ ] Create per-request configs
    • [ ] Remove thread-local state
    • [ ] Add request isolation tests
  4. Clean up:

    • [ ] Remove global config access
    • [ ] Remove thread-local code
    • [ ] Update documentation

Comparison to Thread-Local Approach

Thread-Local:

  • (+) Less refactoring needed
  • (+) Easier backward compatibility
  • (-) Hidden state
  • (-) Need careful thread management
  • (-) More complex testing

Pure Functional:

  • (+) Cleaner architecture
  • (+) Better testing
  • (+) Natural thread safety
  • (-) More refactoring needed
  • (-) Need to pass config everywhere

Decision Points

  1. Backward Compatibility

    • How to handle existing code during transition?
    • Keep globals as fallback temporarily?
  2. Config Lifecycle

    • When/where to create different config levels?
    • How to handle config updates?
  3. Performance Impact

    • Cost of passing config vs thread-local
    • Memory usage patterns
    • Garbage collection behavior

Recommendation

The pure functional approach would be a bigger change but would result in a cleaner, more maintainable codebase. The main tradeoff is the amount of refactoring needed vs the benefits of explicit state passing.

Suggested next steps:

  1. Start with a small core component
  2. Prove the approach works
  3. Gradually expand to more components
  4. Keep backward compatibility during transition

This would be a more ambitious change than the thread-local approach, but would result in a more robust and maintainable system in the long run.

TimeToBuildBob avatar Apr 14 '25 20:04 TimeToBuildBob

Got rid of the use of get_tools in prompt-generation in https://github.com/gptme/gptme/pull/503, now passing down tools: list[ToolSpec].

ErikBjare avatar Apr 16 '25 11:04 ErikBjare

Made the config and loaded tools globals into thread-locals in https://github.com/gptme/gptme/pull/506

Attempted switching from thread-locals to using context vars in https://github.com/gptme/gptme/pull/507 (mostly to improve typing, but some tests are failing...)

ErikBjare avatar Apr 18 '25 14:04 ErikBjare

This issue is largely solved. Some rough edges remain, but far from as many as before.

ErikBjare avatar Jun 09 '25 13:06 ErikBjare