ramalama Add support for llama-stack

Add support for llama-stack

Open rhatdan opened this issue 6 months ago • 4 comments

Add new option --api which allows users to specify the API Server either llama-stack or none. With None, we just generate a service with serve command. With --api llama-stack, RamaLama will generate an API Server listening on port 8321 and a openai server listening on port 8080.

Summary by Sourcery

Add support for a unified API layer with a new --api option, implement llama-stack mode via a Stack class that generates and deploys a Kubernetes pod stack, refactor engine command helpers and label handling, update compute_serving_port and model_factory helpers, and refresh documentation and tests accordingly

New Features:

Add --api option with choices llama-stack or none to unify API layer handling
Implement llama-stack mode to generate and deploy a multi-container Kubernetes stack for API serving

Enhancements:

Refactor engine label handling into a generic add_labels helper and simplify container manager commands (inspect, stop_container, container_connection)
Refactor compute_serving_port to accept args, respect the api option, and display appropriate REST API endpoints
Introduce New and Serve helpers in model_factory for consistent model instantiation

Documentation:

Document the new api option in ramalama.conf and update CLI manpages for run and serve commands

Tests:

Add tests for compute_serving_port behavior with args and api settings and remove outdated stop_container test
Update config loading tests to include default and overridden api values

May 15 '25 21:05 rhatdan

ramalama ramalama copied to clipboard

Add support for llama-stack

Summary by Sourcery

ramalama
ramalama copied to clipboard