ramalama
ramalama copied to clipboard
Add support for llama-stack
Add new option --api which allows users to specify the API Server either llama-stack or none. With None, we just generate a service with serve command. With --api llama-stack, RamaLama will generate an API Server listening on port 8321 and a openai server listening on port 8080.
Summary by Sourcery
Add support for a unified API layer with a new --api option, implement llama-stack mode via a Stack class that generates and deploys a Kubernetes pod stack, refactor engine command helpers and label handling, update compute_serving_port and model_factory helpers, and refresh documentation and tests accordingly
New Features:
- Add --api option with choices llama-stack or none to unify API layer handling
- Implement llama-stack mode to generate and deploy a multi-container Kubernetes stack for API serving
Enhancements:
- Refactor engine label handling into a generic add_labels helper and simplify container manager commands (inspect, stop_container, container_connection)
- Refactor compute_serving_port to accept args, respect the api option, and display appropriate REST API endpoints
- Introduce New and Serve helpers in model_factory for consistent model instantiation
Documentation:
- Document the new api option in ramalama.conf and update CLI manpages for run and serve commands
Tests:
- Add tests for compute_serving_port behavior with args and api settings and remove outdated stop_container test
- Update config loading tests to include default and overridden api values