toolhive icon indicating copy to clipboard operation
toolhive copied to clipboard

`RunConfig` as architectural boundary proposal

Open blkt opened this issue 2 months ago • 6 comments

This change contains a design doc describing an architectural boundary at RunConfig level. Aim of this design doc is

  • listing the current responsibilities of RunConfig and related code (e.g. Runner)
  • propose a new split of responsibilities and a clear golang API for it
  • clarify what responsibilities should be moved in the respective user interfaces, e.g. CLI or Operator

blkt avatar Oct 06 '25 16:10 blkt

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 53.45%. Comparing base (3453db0) to head (6beef0c).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2096      +/-   ##
==========================================
+ Coverage   53.43%   53.45%   +0.01%     
==========================================
  Files         230      230              
  Lines       29385    29385              
==========================================
+ Hits        15703    15707       +4     
+ Misses      12552    12549       -3     
+ Partials     1130     1129       -1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Oct 06 '25 16:10 codecov[bot]

@claude please review this architectural proposal

JAORMX avatar Oct 07 '25 08:10 JAORMX

Claude finished @JAORMX's task —— View job


Architectural Proposal Review

I've reviewed the RunConfig as architectural boundary proposal. Here's my comprehensive feedback:


Overall Assessment

This is a well-structured and necessary proposal that addresses real architectural concerns in ToolHive. The separation of concerns between RunConfig and the consumer interfaces (CLI, Operator, UI) is architecturally sound and will improve maintainability and extensibility.


Strengths

  1. Clear Problem Statement: The proposal effectively articulates the current challenges with configuration handling across different interfaces (CLI, Operator, UI).

  2. Practical Examples: The use cases (config reload semantics, storage location differences, export/import) clearly demonstrate why this boundary is needed.

  3. Appropriate Scope: Making RunConfig analogous to encoding/json (accepting bytes/readers, returning bytes) is the right level of abstraction.

  4. Current State Documentation: Lines 22-70 provide an excellent inventory of what RunConfig currently contains, split into metadata, workload details, and proxy details.


Areas for Improvement

1. Implementation Guidance Missing

The proposal describes what should change but lacks how to achieve it. Consider adding:

  • Migration Strategy: How do we move from the current state to the proposed state?

    • Which fields need to be removed from RunConfig?
    • What's the deprecation timeline for file-based parameters?
    • How do we handle backward compatibility?
  • API Surface Definition: What should the new public API look like?

    // Example:
    // Instead of exposing RunConfig directly, consider:
    type RunConfig struct { /* ... */ }
    func NewRunConfig(opts ...Option) (*RunConfig, error)
    func (rc *RunConfig) Validate() error
    func (rc *RunConfig) MarshalJSON() ([]byte, error)
    func UnmarshalRunConfig(data []byte) (*RunConfig, error)
    

2. Concrete Examples of Violations

The proposal would benefit from specific examples of current violations:

Current problematic fields in RunConfig (from pkg/runner/config.go:34-162):

  • EnvFileDir (line 88) - File system reference
  • PermissionProfileNameOrPath (line 79) - File path reference
  • AuthzConfigPath (line 107) - File path reference
  • AuditConfigPath (line 113) - File path reference
  • JWKSAuthTokenFile (line 145) - File path reference
  • Deployer (line 127) - Runtime-specific dependency

These should be called out explicitly as fields that violate the proposed boundary.

3. Validation Responsibility Unclear

Line 75 mentions "basic validation" as a RunConfig responsibility, but doesn't define what qualifies as "basic":

Suggested clarification:

  • Basic validation (RunConfig's responsibility): Structural validity, required fields present, type constraints
  • Contextual validation (CLI/Operator responsibility): File existence, network availability, secret references, registry lookups

4. Serialization Format Not Specified

Line 75 mentions serialization but doesn't specify:

  • What formats should be supported? (JSON only? YAML? Both?)
  • Should the format be versioned? (The struct has SchemaVersion but no migration strategy)
  • What happens when SchemaVersion changes?

5. Missing Discussion of Runner Responsibilities

The proposal mentions Runner in the title and problem statement but doesn't clearly define its responsibilities post-refactor. Consider adding:

Proposed Runner Responsibilities:

  • Accept a validated RunConfig
  • Execute workloads using the provided deployer
  • Manage workload lifecycle (start, stop, monitor)
  • Handle middleware chain construction
  • NOT responsible for: config file reading, secret fetching, registry lookups

6. Impact on Existing Code Not Assessed

The proposal doesn't mention:

  • How many call sites need updates?
  • What's the blast radius of this change?
  • Which packages are affected?

Affected areas (from my analysis):

  • cmd/thv/app/run.go - CLI configuration building
  • cmd/thv-operator/controllers/mcpserver_runconfig.go - Operator CRD mapping
  • pkg/runner/config_builder.go - Builder pattern needs updating
  • pkg/state/runconfig.go - Storage abstraction

7. "No File-Based Representation" Needs Clarification

Line 77 states: "no file-based representation of configuration parameters is allowed in the RunConfig struct"

Question: Does this mean:

  • ❌ No file paths (like EnvFileDir, AuthzConfigPath)?
  • ❌ No references to external files at all?
  • ✅ File contents should be resolved by CLI/Operator before creating RunConfig?

This should be explicit with before/after examples:

// ❌ Current (bad)
type RunConfig struct {
    AuthzConfigPath string
}

// ✅ Proposed (good)  
type RunConfig struct {
    AuthzConfig *authz.Config // already resolved
}

8. External API Stability Concern

Line 81 mentions: "Types exposed by RunConfig package should not be used for externally facing formats like HTTP API"

Concern: The current RunConfig struct (pkg/runner/config.go:34-162) has a comment stating it's "part of ToolHive's API contract" (line 36-37). This proposal doesn't address:

  • Is RunConfig JSON format considered a public API?
  • How do we version it if we need breaking changes?
  • What's the compatibility guarantee?

Specific Technical Concerns

1. Storage Abstraction Already Exists

The proposal treats storage as CLI/Operator responsibility, but pkg/state/runconfig.go already provides a storage abstraction layer. How does this fit into the new design?

Recommendation: Clarify whether pkg/state is above or below the boundary.

2. Current Builder Pattern May Need Redesign

pkg/runner/config_builder.go currently handles file resolution during building. With the proposed changes, this logic moves to CLI/Operator, but the builder pattern might still be useful for constructing valid RunConfig instances.

Recommendation: Provide guidance on whether builders should exist and where.

3. Middleware Configuration Complexity

The MiddlewareConfigs field (pkg/runner/config.go:159-161) is complex and touches both proxy configuration and execution environment. The proposal doesn't clarify if/how middleware configuration should be handled.


Missing Sections

Consider adding:

  1. Non-Goals: What are we explicitly NOT trying to solve?

    • Are we keeping backward compatibility for existing serialized configs?
    • Are we refactoring the Operator CRD structure?
  2. Success Criteria: How do we know when this is complete?

    • All file path references removed from RunConfig?
    • CLI and Operator can construct RunConfig without accessing filesystem during Runner.Run()?
    • Configuration reload is handled at CLI/Operator level?
  3. Testing Strategy: How do we validate the boundary is maintained?

    • Linter rules to prevent file paths in RunConfig?
    • Interface contracts to enforce separation?
  4. Timeline: When do we want this implemented?


Recommendations

Priority 1 (Critical for proposal acceptance):

  1. Add concrete examples of fields that violate the current boundary
  2. Define what "basic validation" means
  3. Clarify the "no file-based representation" rule with examples
  4. Add migration strategy and backward compatibility approach

Priority 2 (Important for implementation):

  1. Define the new public API surface
  2. Clarify Runner responsibilities post-refactor
  3. Specify serialization format and versioning strategy
  4. Assess impact on existing code

Priority 3 (Nice to have):

  1. Add non-goals and success criteria
  2. Include testing strategy
  3. Define timeline

Conclusion

This proposal identifies a legitimate architectural issue and proposes a sensible solution. However, it needs more implementation detail to be actionable. The core idea of treating RunConfig as a pure data structure (like encoding/json) is sound, but the transition path from current state to desired state needs to be mapped out.

Recommendation: Expand the "Responsibility Split" section significantly before implementation begins, or create a follow-up implementation plan document that addresses the gaps identified above.


References

  • Current RunConfig implementation: pkg/runner/config.go:34-162
  • Current storage abstraction: pkg/state/runconfig.go
  • Operator usage: cmd/thv-operator/controllers/mcpserver_runconfig.go
  • CLI usage: cmd/thv/app/run.go

Job Run | Branch: docs/runconfig-proposal

claude[bot] avatar Oct 07 '25 08:10 claude[bot]

An open question I'd like to discuss tomorrow is: Should we have information about the secret provider that's used and a key to retrieve the secret at runtime? This would make it easier to share configurations.

@JAORMX currently, RunConfig contains "how to identify the secret", but not "where to get it from". Additionally, it is only possible to configure a single secrets provider at a time. We're fine keeping this as is for now.

blkt avatar Oct 08 '25 12:10 blkt

@blkt Was there anything on this we needed to do to get it ready for review?

ChrisJBurns avatar Nov 18 '25 19:11 ChrisJBurns

Yes, I need to get rid of obsolete fields, which should be either tagged or just absent in the RunConfig struct.

blkt avatar Nov 18 '25 19:11 blkt