llama-stack
llama-stack copied to clipboard
Adds SambaNova Cloud Inference Adapter
trafficstars
What does this PR do?
This PR adds a SambaNova inference adapter that enables integration with SambaNova's AI models through their OpenAI-compatible API.
Key features implemented:
- Chat completion API with streaming support
- Function/tool calling for supported models (3.1 series: 8B, 70B, 405B)
- Support for multiple Llama 3 models (1B, 3B, 11B, 90B vision models, etc.)
- Distribution template based on the examples
What it does not support:
- Text completion API
- Embeddings API
- Certain OpenAI features (logprobs, presence/frequency penalties, parallel tool calls, etc.)
- Response format options (JSON mode)
Test Plan
Run the following test command:
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py --env SAMBANOVA_API_KEY=<your-api-key>
To test the distribution template:
llama stack build --template sambanova --image-type conda
llama stack run ./run.yaml --port 5001 --env SAMBANOVA_API_KEY=<your-api-key>
Sources
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [x] Read the contributor guideline, Pull Request section?
- [ ] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.