feat: Add support for query rewrite in vector_store.search
What does this PR do?
Actualize query rewrite in search API, add default_query_expansion_model and query_expansion_prompt in VectorStoresConfig.
Makes rewrite_query parameter functional in vector store search.
rewrite_query=false(default): Use original queryrewrite_query=true: Expand query via LLM, or fail gracefully if no LLM available
Adds 4 parameters toVectorStoresConfig:
default_query_expansion_model: LLM model for query expansion (optional)query_expansion_prompt: Custom prompt template (optional, uses built-in default)query_expansion_max_tokens: Configurable token limit (default: 100)query_expansion_temperature: Configurable temperature (default: 0.3)
Enabled run.yaml:
vector_stores:
rewrite_query_params:
model:
provider_id: "meta-reference"
model_id: "llama3_1_8b_instruct"
# prompt defaults to built-in
# max_tokens defaults to 100
# temperature defaults to 0.3
Fully customized run.yaml:
vector_stores:
rewrite_query_params:
model:
provider_id: "meta-reference"
model_id: "llama3_1_8b_instruct"
prompt: "Improve this search query: {query}"
max_tokens: 150
temperature: 0.2
Test Plan
Added test and recording
Example script as well:
import asyncio
from llama_stack_client import LlamaStackClient
from io import BytesIO
def gen_file(client, text: str=""):
file_buffer = BytesIO(text.encode('utf-8'))
file_buffer.name = "my_file.txt"
uploaded_file = client.files.create(
file=file_buffer,
purpose="assistants"
)
return uploaded_file
async def test_query_rewriting():
client = LlamaStackClient(base_url="http://0.0.0.0:8321/")
uploaded_file = gen_file(client, "banana banana apple")
uploaded_file2 = gen_file(client, "orange orange kiwi")
vs = client.vector_stores.create()
xf_vs = client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file.id)
xf_vs1 = client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file2.id)
response1 = client.vector_stores.search(
vector_store_id=vs.id,
query="apple",
max_num_results=3,
rewrite_query=False
)
response2 = client.vector_stores.search(
vector_store_id=vs.id,
query="kiwi",
max_num_results=3,
rewrite_query=True,
)
print(f"\n🔵 Response 1 (rewrite_query=False):\n\033[94m{response1}\033[0m")
print(f"\n🟢 Response 2 (rewrite_query=True):\n\033[92m{response2}\033[0m")
for f in [uploaded_file.id, uploaded_file2.id]:
client.files.delete(file_id=f)
client.vector_stores.delete(vector_store_id=vs.id)
if __name__ == "__main__":
asyncio.run(test_query_rewriting())
And see the screen shot of the server logs showing it worked.
Notice the log:
Query rewritten:
'kiwi' → 'kiwi, a small brown or green fruit native to New Zealand, or a person having a fuzzy brown outer skin similar in appearance.'
So kiwi was expanded.
@franciscojavierarceo fyi, the example has apple as the first query, the log shows kiwi twice
what about having the vector store config specify the rewrite model and the request is rejected if none is configured?
Yeah, that's actually what I ended up adding, sorry requested reviews a bit premature I'll push that update soon.
@franciscojavierarceo please update the description with the new proposed config and user interaction
This pull request has merge conflicts that must be resolved before it can be merged. @franciscojavierarceo please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
the spec doesn't mention letting api callers override the vector store config, but the implementation appears to let them. is this intended?
@mattf this was added at the request of @raghotham in discord. I can update the PR description to reflect this for sure. I actually would have preferred to do so as a follow up to reduce scope.
the model validation logic should be left to the inference api. if the model is of the wrong type the inference api should handle the error generation.
Sounds good, will revise.
vector store model type validation should happen during startup, if at all. right now the admin sees a successful startup and the user sees 400 errors about a misconfiguration.
Will revise. 👍
the spec doesn't mention letting api callers override the vector store config, but the implementation appears to let them. is this intended?
@mattf this was added at the request of @raghotham in discord. I can update the PR description to reflect this for sure. I actually would have preferred to do so as a follow up to reduce scope.
@raghotham imho we have two use cases here:
- (a) production - the vector store config gets set at creation time, which should include locking in the embedding model + rewrite model + rewrite prompt. having the rewrite model & prompt flexibility allows for tailoring vector stores to an app (real value here).
- (b) research & development - the best config for the vector store is not known / tuned yet. in this case, it's wasteful to recreate (re-embed) the vector store for each iteration to tune the rewrite model & prompt. therefore, passing a new model & prompt for each query is important.
if we need both (a) and (b) in this pr, i suggest not taking extra args on the query call, but using the existing vector store update path.
Ack @mattf - +1 with the startup validation approach.
@franciscojavierarceo FYI I created #4184 to tackle startup validation for optional dependencies so admins catch misconfigurations at deploy time rather than users hitting 400 errors and needing to debug after. Looks like the validation changes here would align with that pattern.
On the production vs. R&D discussion: Matt's suggestion to use the "update vector store" path seems like a good middle ground - gives R&D the flexibility to iterate without per-query instability. Happy to discuss additional validation patterns in #4184 if it helps.
if rewrite is requested but the feature isn't configured, the user should get a 400 instead of silently ignoring
Will do.
the globals are risky and will lead to bugs. there may already be one for two vector stores configured w/ different prompts. can we skip the globals and use something like self.vector_store.vector_stores_config in query_chunks instead?
@mattf if we want to use self.vector_store.vector_stores_config we'll have to modify VectorStoreWithIndex which will result in requiring us to modify all of the adapters. The globals are the alternative I thought made sense, let me know if you have alternative suggestions.
@mattf I updated the code to handle the case where rewrite is requested but not configured.
Mind if I split the Adapter changes in a subsequent PR?
if rewrite is requested but the feature isn't configured, the user should get a 400 instead of silently ignoring
Will do.
the globals are risky and will lead to bugs. there may already be one for two vector stores configured w/ different prompts. can we skip the globals and use something like self.vector_store.vector_stores_config in query_chunks instead?
@mattf if we want to use
self.vector_store.vector_stores_configwe'll have to modifyVectorStoreWithIndexwhich will result in requiring us to modify all of the adapters. The globals are the alternative I thought made sense, let me know if you have alternative suggestions.
my mistake, you're right that query_chunks is adapter specific. i was trying to suggest lifting the handling out of the individual adapters entirely. i don't see why adapters need to even know that the user is requesting query rewriting. the adapter just accepts a string to query against. worst case, the rewrite could happen in the mixin?
btw, i noticed that the routing layer only ever passes down a query: str, but the adapters accept a query: InterleavedContent.
@mattf mind taking a look?
@mattf updated again, thanks for the feedback!
This pull request has merge conflicts that must be resolved before it can be merged. @franciscojavierarceo please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
@franciscojavierarceo i pushed two commits, one moves the rewrite prompt validation to stack startup and the other moves the rewrite functionality into the router (making it available to all providers)
@mattf I had one small comment but I'm good with this. LMK if there's anything else you'd like to see.