llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

Add Groq inference provider

Open bklieger-groq opened this issue 11 months ago • 0 comments

What does this PR do?

This PR adds a Groq inference provider that allows integration with Groq's AI inference offerings for Llama models. Groq has an OpenAI-compatible endpoint.

Added support for Chat Completions with:

  1. Llama 3.0 8b & 70b
  2. Llama 3.1 8b & 70b
  3. Llama 3.2 1b, 3b, 11b, and 90b.

The integration includes support for streaming, JSON mode, and tool calling.


Missing support:

  1. Completions (non-Chat completions)
  2. Top_k and repetition_penalty
  3. Embeddings

Test Plan

Groq has been added to the existing test plan. You can run it with the following command:

GROQ_API_KEY=<api-key> pytest -s -v --providers inference=groq llama_stack/providers/tests/inference/test_text_inference.py

You can get a Groq API key for free here: https://console.groq.com/keys

10 tests pass, 6 are skipped, none fail.

Sources

Documentation: https://console.groq.com/docs/overview API Reference: https://console.groq.com/docs/api-reference#chat-create

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [ ] Ran pre-commit to handle lint / formatting issues.
  • [*] Read the contributor guideline, Pull Request section?
  • [ ] Updated relevant documentation.
  • [*] Wrote necessary unit or integration tests.

bklieger-groq avatar Nov 26 '24 02:11 bklieger-groq