llama-stack
llama-stack copied to clipboard
Add Groq inference provider
What does this PR do?
This PR adds a Groq inference provider that allows integration with Groq's AI inference offerings for Llama models. Groq has an OpenAI-compatible endpoint.
Added support for Chat Completions with:
- Llama 3.0 8b & 70b
- Llama 3.1 8b & 70b
- Llama 3.2 1b, 3b, 11b, and 90b.
The integration includes support for streaming, JSON mode, and tool calling.
Missing support:
- Completions (non-Chat completions)
- Top_k and repetition_penalty
- Embeddings
Test Plan
Groq has been added to the existing test plan. You can run it with the following command:
GROQ_API_KEY=<api-key> pytest -s -v --providers inference=groq llama_stack/providers/tests/inference/test_text_inference.py
You can get a Groq API key for free here: https://console.groq.com/keys
10 tests pass, 6 are skipped, none fail.
Sources
Documentation: https://console.groq.com/docs/overview API Reference: https://console.groq.com/docs/api-reference#chat-create
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [*] Read the contributor guideline, Pull Request section?
- [ ] Updated relevant documentation.
- [*] Wrote necessary unit or integration tests.