Ashwin Bharambe issues

Results 17 issues of


                                            Ashwin Bharambe

Quick example illustrating `get_request_provider_data`

CLA Signed

Use inference APIs for executing Llama Guard

We should use Inference APIs to execute Llama Guard instead of directly needing to use HuggingFace APIs. The actual inference consideration is handled by Inference.

CLA Signed

Extract provider data properly (attempt 2)

In the previous design, the server endpoint at the top-most level extracted the headers from the request and set provider data (e.g., private keys) that the implementations could retrieve using...

CLA Signed

[functionality] Implement completion() methods

Most of the current inference providers only implement the `chat_completion()` method. The `completion()` method raises a `NotImplementedError`. We should implement this method for all the inference providers: - meta-reference -...

good first issue

Remove "routing_table" and "routing_key" concepts for the user

This PR makes several core changes to the developer experience surrounding Llama Stack. **Background:** PR https://github.com/meta-llama/llama-stack/pull/92 introduced the notion of "routing" to the Llama Stack. It introduces three object types:...

CLA Signed

Add support for Structured Output / Guided decoding

Added support for structured output in the API and added a reference implementation for meta-reference. A few notes: - Two formats are specified in the API: Json schema and EBNF...

CLA Signed

Significantly simpler and malleable test setup

# What does this PR do? Significantly simplifies running tests. Previously you ran tests by doing: ```bash MODEL_ID= PROVIDER_ID= PROVIDER_CONFIG=config.yaml pytest -s llama_stack/providers/tests/inference/test_inference.py ``` This was pretty annoying because -...

CLA Signed

vllm does not work with image URLs

### System Info ... ### Information - [ ] The official example scripts - [ ] My own modified scripts ### 🐛 Describe the bug vLLM does not work when...

Support guided decoding with vllm and remote::vllm

### 🚀 The feature, motivation and pitch (fireworks, together, meta-reference) support guided decoding (specifying a json-schema for example, as a "grammar" for decoding) with inference. vLLM supports this functionality --...

good first issue

Add test::mock providers

### 🚀 The feature, motivation and pitch We have a decently flexible testing system for testing various combinations of providers when composing a Llama Stack. See https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/tests/README.md We need to...

good first issue