Roy Belio comments

Results 27 comments of


                                            Roy Belio

Remove scipy dependency

https://github.com/Blaizzy/mlx-vlm/pull/268/

reduce overhead of creating an OpenAI Client for each inference request

A few notes here: at least originally this was intentional: the test_inference_client_caching.py:128-139 explicitly verifies that clients are NOT cached because API keys can come from different users per-request via the...

Implement Contextual Retrieval and Contextual Preprocessing

@franciscojavierarceo is this and #4021 been addressed in any way? I'd like to take the effort

RFE: Accept file:// for URI for RAGDocument

/assign This is a legitimate bug that creates confusion for users. The bug exists in two locations: vector_store.py:144 - The `content_from_doc()` function uses a regex pattern that matches file://: `pattern...

Responses RAG works with one model but not another

The bug is at [streaming.py:573](https://github.com/llamastack/llama-stack/blob/aac494c5baca31fca434c197e65567f1ee8672b2/src/llama_stack/providers/inline/agents/meta_reference/responses/streaming.py#L573) where chunk_finish_reason = "" is initialized. When streaming chunks don't provide a finish_reason (might be related to with Llama providers), this empty string fails OpenAI...

Roy Belio

Remove scipy dependency

Remove scipy dependency

reduce overhead of creating an OpenAI Client for each inference request

Implement Contextual Retrieval and Contextual Preprocessing

RFE: Accept file:// for URI for RAGDocument

Responses RAG works with one model but not another

AttributeError: 'AsyncStream' object has no attribute 'aclose'. Did you mean: 'close'?

Pydantic Validation Error in VectorStoreSearchResponse with List-Type Metadata

feat(cli): use gunicorn to manage server workers on unix systems

feat(cli): use gunicorn to manage server workers on unix systems