llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

Composable building blocks to build Llama Apps

Results 360 llama-stack issues
Sort by recently updated
recently updated
newest added

### 🚀 Describe the new functionality needed ### Summary As discussed in #1061, this RFC introduces the design and endpoint specification for managing, invoking and serving document preprocessors. A preprocessor...

enhancement
RAG
stale

### 🚀 Describe the new functionality needed This issue proposes integrating Hyperparameter Optimization (HPO) into the Llama stack to enhance model performance tuning and improve efficiency in parameter selection. ###...

enhancement
RAG
stale

### 🚀 Describe the new functionality needed - We currently perform adhoc preprocessing & ingesting with documents as attachment in agent on the fly. Code Pointer: https://github.com/meta-llama/llama-stack/blob/33b096cc21e48910cf05f0c3e513032adb99fa84/llama_stack/providers/inline/agents/meta_reference/agent_instance.py#L922-L930 - We should...

enhancement
RAG
stale

### System Info ``` PyTorch version: 2.6.0+cu124 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build PyTorch: N/A OS: Fedora Linux 40 (Workstation Edition) (x86_64)...

bug
stale

# What does this PR do? This PR adds the keyword search implementation for Milvus. Along with the implementation for remote Milvus, the tests require us to start a Milvus...

CLA Signed

### 🐛 Describe the bug Llama Stack uses FastAPI and an async event loop. FastAPI uses a single event loop to dispatch requests to all async request handlers. If this...

bug

### System Info GPU Type: NVIDIA A100 OS: Ubuntu 24.04 CUDA: 12.8 ### Information - [ ] The official example scripts - [x] My own modified scripts ### 🐛 Describe...

bug
stale

### System Info PyTorch version: 2.7.0+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.5 LTS (x86_64) GCC version: (Ubuntu...

bug

### 🚀 Describe the new functionality needed As of now, the RAG ingestion documents chunks the documents using a trivial algorithm of overlapping chunks and converts PDFs (and PDFs only)...

enhancement
RAG
stale

# What does this PR do? Converts blocking calls to async calls within the following providers/components: - runpod (inference) - sentence_transformers (inference) - litellm (inference) [//]: # (If resolving an...

CLA Signed