llama-stack Implement Contextual Retrieval and Contextual Preprocessing

🚀 Describe the new functionality needed

Anthropic published Contextual Retrieval and Contextual Preprocessing and I think adding this behavior to the stack would be beneficial (we have several customers that have asked for this to be included in the stack explicitly).

In short, Anthropic recommends using an LLM to summarize a chunk within the broader context of a document before embedding it.

An example:

original_chunk = "The company's revenue grew by 3% over the previous quarter."

contextualized_chunk = "This chunk is from an SEC filing on ACME corp's performance in Q2 2023; the previous quarter's revenue was $314 million. The company's revenue grew by 3% over the previous quarter."

The contextualized_chunk is generated through inference and this prompt:

<document> 
{{WHOLE_DOCUMENT}} 
</document> 
Here is the chunk we want to situate within the whole document 
<chunk> 
{{CHUNK_CONTENT}} 
</chunk> 
Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.

This extension would require an enhancement to our existing OpenAIVectorStoreMixin.openai_attach_file_to_vector_store() behavior and we could make this a configurable option.

💡 Why is this needed? What if we don't build it?

Improvements to context processing and retrieval.

Other thoughts

No response

Oct 31 '25 13:10 franciscojavierarceo

@franciscojavierarceo is this and #4021 been addressed in any way? I'd like to take the effort

Nov 24 '25 09:11 r-bit-rry

We probably need https://github.com/llamastack/llama-stack/pull/4113 to land first to iron out some foundation

Nov 24 '25 13:11 franciscojavierarceo