llama-stack
llama-stack copied to clipboard

Published 20 hours ago •

Reame
Issues

Adds groq inference adapter.

Open swanhtet1992 opened this issue 11 months ago • 0 comments

What does this PR do?

This PR adds a groq inference adapter.

Key features implemented:

Chat completion API with streaming support
Distribution template for easy deployment

What it does not support:

Text completion API
Embeddings API
Certain OpenAI features:
- logprobs and top_logprobs
- response_format options

Test Plan

Run the following test command:

pytest -s -v --providers inference=groq llama_stack/providers/tests/inference/ --env groq_API_KEY=<your-api-key>

To test the distribution template:

# Docker
LLAMA_STACK_PORT=5001
docker run -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  llamastack/distribution-groq \
  --port $LLAMA_STACK_PORT \
  --env groq_API_KEY=$groq_API_KEY

# Conda
llama stack build --template groq --image-type conda
llama stack run ./run.yaml \
  --port $LLAMA_STACK_PORT \
  --env groq_API_KEY=$groq_API_KEY

Sources

groq API Documentation

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Ran pre-commit to handle lint / formatting issues.
[x] Read the contributor guideline, Pull Request section
[ ] Updated relevant documentation.
[x] Wrote necessary unit or integration tests.

Nov 24 '24 09:11 swanhtet1992