Add Groq inference provider

Open bklieger-groq opened this issue 11 months ago • 0 comments

What does this PR do?

This PR adds a Groq inference provider that allows integration with Groq's AI inference offerings for Llama models. Groq has an OpenAI-compatible endpoint.

Added support for Chat Completions with:

Llama 3.0 8b & 70b
Llama 3.1 8b & 70b
Llama 3.2 1b, 3b, 11b, and 90b.

The integration includes support for streaming, JSON mode, and tool calling.

Missing support:

Completions (non-Chat completions)
Top_k and repetition_penalty
Embeddings

Test Plan

Groq has been added to the existing test plan. You can run it with the following command:

GROQ_API_KEY=<api-key> pytest -s -v --providers inference=groq llama_stack/providers/tests/inference/test_text_inference.py

You can get a Groq API key for free here: https://console.groq.com/keys

10 tests pass, 6 are skipped, none fail.

Sources

Documentation: https://console.groq.com/docs/overview API Reference: https://console.groq.com/docs/api-reference#chat-create

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Ran pre-commit to handle lint / formatting issues.
[*] Read the contributor guideline, Pull Request section?
[ ] Updated relevant documentation.
[*] Wrote necessary unit or integration tests.

Nov 26 '24 02:11 bklieger-groq

llama-stack llama-stack copied to clipboard

Add Groq inference provider

What does this PR do?

Test Plan

Sources

Before submitting

llama-stack
llama-stack copied to clipboard