llama-stack Multiple providers blocking the async event loop

Multiple providers blocking the async event loop

Open bbrowning opened this issue 7 months ago • 8 comments

🐛 Describe the bug

Llama Stack uses FastAPI and an async event loop. FastAPI uses a single event loop to dispatch requests to all async request handlers. If this event loop gets blocked - for example by doing some blocking operation inside the request event loop - then all request handling of the server gets stopped until the blocking operation completes. So, it's imperative that we never block the main request event loop.

Today, many of our providers have async request handlers that are actually performing blocking operations. So, we're regularly blocking the event loop for disk I/O, network operations, compute-intensive tasks, and related things. Here's an inventory of all the provider implementations today that appear to be doing blocking operations in async methods:

providers/inline/datasetio/localfs/datasetio.py
- blocking file operations
providers/inline/post_training/torchtune/recipes/lora_finetuning_single_device.py
- blocking torch operations in event loop
providers/inline/safety/prompt_guard/prompt_guard.py
- blocking tokenization and torch operations
providers/inline/scoring/braintrust/braintrust.py
- likely blocking calls to braintrust evaluators
providers/inline/tool_runtime/code_interpreter/code_interpreter.py
- blocking calls to python code execution
providers/inline/vector_io/faiss/faiss.py
- likely blocking calls to faiss index search
providers/inline/vector_io/sqlite_vec/sqlite_vec.py
- blocking database operations (query, insert, etc)
providers/remote/datasetio/huggingface/huggingface.py
- blocking network calls to huggingface
providers/remote/inference/bedrock/bedrock.py
- blocking network calls via bedrock client
providers/remote/inference/databricks/databricks.py
- blocking calls to OpenAI client
providers/remote/inference/fireworks/fireworks.py
- likely some blocking calls in _stream_completion and embeddings
providers/remote/inference/passthrough/passthrough.py
- blocking calls to passthrough LlamaStackClient
providers/remote/inference/runpod/runpod.py
- blocking calls to OpenAI client
providers/remote/inference/sambanova/sambanova.py
- blocking calls to OpenAI client
providers/remote/inference/together/together.py
- blocking calls to Together client
providers/remote/safety/bedrock/bedrock.py
- blocking network calls via bedrock client
providers/remote/tool_runtime/bing_search/bing_search.py
- blocking network calls
providers/remote/tool_runtime/brave_search/brave_search.py
- blocking network calls
providers/remote/tool_runtime/tavily_search/tavily_search.py
- blocking network calls
providers/remote/tool_runtime/wolfram_alpha/wolfram_alpha.py
- blocking network calls
providers/remote/vector_io/chroma/chroma.py
- blocking calls when using local chroma client
providers/remote/vector_io/milvus/milvus.py
- blocking calls with milvus client
providers/remote/vector_io/pgvector/pgvector.py
- blocking SQL calls
providers/remote/vector_io/weaviate/weaviate.py
- blocking calls with weaviate client
providers/utils/kvstore/mongodb/mongodb.py
- blocking calls with MongoClient
providers/utils/kvstore/postgres/postgres.py
- blocking calls with postgres client
providers/utils/inference/embedding_mixin.py
- blocking loading and usage of embedding model
providers/utils/inference/litellm_openai_mixin.py
- blocking calls to litellm

This list was compiled from a quick scan through the code, and I may have missed some. All of these need to be rewritten to either use async operations or move their blocking operations into separate threads or processes with async operations that wait on those separate threads or processes to complete.

Expected behavior

We should not be blocking the event loop so that a single Llama Stack server can handle a reasonable amount of concurrent requests.

Mar 07 '25 21:03 bbrowning

llama-stack llama-stack copied to clipboard

Multiple providers blocking the async event loop

🐛 Describe the bug

Expected behavior

llama-stack
llama-stack copied to clipboard