toolhive icon indicating copy to clipboard operation
toolhive copied to clipboard

Idea: Introduce MCPGateway CRD for pluggable gateway implementations

Open JAORMX opened this issue 1 month ago • 0 comments

Problem Statement

Users can already deploy alternative MCP gateway solutions (like kagenti/mcp-gateway) alongside ToolHive, and Virtual MCP is entirely optional. However, our current CRDs (VirtualMCPServer, MCPGroup) are not as abstract and reusable as they could be for enabling a true plug-and-play ecosystem.

While different gateway implementations can coexist today, they don't share common abstractions. Each gateway solution must define its own resources and discovery mechanisms, missing the opportunity for interoperability that patterns like Kubernetes Gateway API and Ingress provide.

Current State

What works today:

  • ✅ Users can deploy any MCP gateway (Virtual MCP, kagenti/mcp-gateway, custom solutions)
  • ✅ Virtual MCP is completely optional
  • MCPGroup provides backend grouping and discovery
  • ✅ Multiple gateway types can run in the same cluster

What could be better:

  • ❌ No standardized gateway abstraction
  • ❌ Each implementation defines its own CRD structure
  • ❌ Limited interoperability between gateway implementations
  • VirtualMCPServer is tightly coupled to our specific implementation
  • ❌ Switching gateway implementations requires resource rewrites

Current Architecture

VirtualMCPServer (our implementation):

  • References MCPGroup to discover backend workloads via ListWorkloadsInGroup()
  • Handles aggregation, conflict resolution, composite tools
  • Manages authentication (incoming/outgoing)
  • Deploys as a Deployment with Service
  • Implementation-specific configuration mixed with common gateway concerns

MCPGroup (shared resource):

  • Simple grouping mechanism with label selectors
  • Lists member MCPServer workloads
  • Provides backend discovery interface
  • Already useful across different gateway implementations

Inspiration from Kubernetes Patterns

Gateway API Pattern

  • GatewayClass: Defines configuration template, handled by specific controller
  • Gateway: Instance of a class, requests traffic translation
  • Routes: Define how traffic maps to services
  • Multiple implementations can coexist (Istio, NGINX, Envoy)

Ingress Pattern

  • IngressClass: Specifies which controller handles the Ingress
  • Ingress: Defines traffic rules
  • Controllers claim responsibility via ingressClassName
  • Multiple controllers coexist peacefully

kagenti/mcp-gateway

  • Envoy-based aggregation with ext_proc filter
  • Supports both standalone (file config) and Kubernetes (CRD) modes
  • Integrates with Gateway API HTTPRoute
  • Emphasizes policy flexibility through Envoy filters
  • Uses tool prefixes for conflict resolution

Potential Vision: MCPGateway + MCPGatewayClass

Similar to Gateway API, we could introduce standardized abstractions:

MCPGatewayClass

Defines the gateway implementation and its configuration contract:

  • Controller name (e.g., toolhive.stacklok.dev/virtual-mcp, kagenti.io/envoy-gateway)
  • Implementation-specific parameters
  • Shared capabilities and defaults

MCPGateway

User-facing resource that:

  • References an MCPGatewayClass
  • References MCPGroup for backend discovery
  • Defines common gateway concerns (service type, listeners)
  • Includes implementation-specific configuration

Benefits

  1. Reusability: Common abstractions reduce duplication across gateway implementations
  2. Interoperability: Standard backend discovery via MCPGroup works for any gateway
  3. Flexibility: Users can choose gateway implementations based on their needs
  4. Ecosystem Growth: Easier for community to build compatible gateway solutions
  5. Migration: Users can switch gateway implementations with minimal config changes
  6. Coexistence: Multiple gateway types already work, but with better shared patterns

Open Questions

  1. Is this abstraction worth pursuing? Do we see enough interest in multiple gateway implementations to justify the standardization effort?
  2. What belongs in the common spec? What configuration should be standardized vs implementation-specific?
  3. Backend discovery interface: Should MCPGroup remain the primary discovery mechanism? Should gateways support other patterns?
  4. Migration strategy: How do we handle existing VirtualMCPServer resources?
  5. Adoption path: Would gateway implementers (including kagenti/mcp-gateway) benefit from and adopt these abstractions?
  6. Scope of standardization: How much should we standardize vs let implementations differ?

Next Steps

This is an exploratory idea to gather feedback on whether:

  • The abstraction provides meaningful value over current flexibility
  • The Gateway API pattern fits MCP gateway use cases
  • There's community/user interest in standardized gateway abstractions
  • The complexity is worth the improved interoperability

We'd love input from users who might want to:

  • Use alternative gateway implementations
  • Build custom MCP gateways
  • Switch between gateway implementations more easily
  • Integrate with existing service mesh or API gateway infrastructure

Related Resources

  • Current VirtualMCPServer: cmd/thv-operator/api/v1alpha1/virtualmcpserver_types.go
  • MCPGroup: cmd/thv-operator/api/v1alpha1/mcpgroup_types.go
  • Backend discovery: pkg/vmcp/aggregator/discoverer.go
  • Virtual MCP proposal: docs/proposals/THV-2106-virtual-mcp-server.md
  • Kubernetes Gateway API: https://gateway-api.sigs.k8s.io/
  • kagenti/mcp-gateway: https://github.com/kagenti/mcp-gateway

JAORMX avatar Nov 24 '25 16:11 JAORMX