ramalama icon indicating copy to clipboard operation
ramalama copied to clipboard

Add support for llama-stack

Open rhatdan opened this issue 6 months ago • 4 comments

Add new option --api which allows users to specify the API Server either llama-stack or none. With None, we just generate a service with serve command. With --api llama-stack, RamaLama will generate an API Server listening on port 8321 and a openai server listening on port 8080.

Summary by Sourcery

Add support for a unified API layer with a new --api option, implement llama-stack mode via a Stack class that generates and deploys a Kubernetes pod stack, refactor engine command helpers and label handling, update compute_serving_port and model_factory helpers, and refresh documentation and tests accordingly

New Features:

  • Add --api option with choices llama-stack or none to unify API layer handling
  • Implement llama-stack mode to generate and deploy a multi-container Kubernetes stack for API serving

Enhancements:

  • Refactor engine label handling into a generic add_labels helper and simplify container manager commands (inspect, stop_container, container_connection)
  • Refactor compute_serving_port to accept args, respect the api option, and display appropriate REST API endpoints
  • Introduce New and Serve helpers in model_factory for consistent model instantiation

Documentation:

  • Document the new api option in ramalama.conf and update CLI manpages for run and serve commands

Tests:

  • Add tests for compute_serving_port behavior with args and api settings and remove outdated stop_container test
  • Update config loading tests to include default and overridden api values

rhatdan avatar May 15 '25 21:05 rhatdan