azure-search-openai-demo
azure-search-openai-demo copied to clipboard
Formatting request decreases the query-quality
Please provide us with the following information:
This issue is for a: (mark with an x)
- [ x ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
When asking questions related to the documents, it's noticed that adding formatting requests like "as a list" or "as a table in English" negatively affects search outcomes. For instance, posing a question about "quality management and improvement procedures" produces better results compared to "quality management and improvement procedures as a table with a numbered list". This leads to significantly lower rankings and a different order of results in Azure Cognitive Search, which distorts the result.
Examples:
quality management and improvement procedures "Quality management" AND "improvement procedures" rerankerScore of the most relevant document: 3.495
quality management and improvement procedures as a table "quality management" "improvement procedures" table rerankerScore of the most relevant document: 2.042
quality management and improvement procedures as a table with a numbered list in english "Quality management improvement procedures table" rerankerScore of the most relevant document: 1.933
Any log messages given by the failure
N/A
Expected/desired behavior
The generated queries do not depend on any formatting or language requests in the question text, formatting is only important for the answer generation in "Step 3" and not for the query generation in "Step 1". It would be great if it was possible to adjust query_prompt_template in such a way that it'd focus on the factual part and ignore the formatting part.
OS and Version?
macOS Sonoma Version 14.2.1 (23C71)
azd version?
azd version 1.5.0
Versions
latest
Mention any other details that might be useful
Thanks! We'll be in touch soon.
Thanks for the feedback.
Can you open the developer settings tab and experiment with the following options to see if your results improve?
- Text only search
- Vector only search
- Hybrid search
- Semantic ranking enabled / disabled
Also, do you get different results on "chat" vs. "ask a question" modes? Please let me know.
Hello @mattgotteiner, sorry for the delayed answer.
Unfortunately, I could not see much of a difference after those changes. I've omitted "Text only search" though, because in this case the results are usually much worse and I'm getting "None" in doc['@search.reranker_score'], which I need for my project
Matt might have other suggestions as well, but I'm curious if you've tried modifying the few-shot examples that are sent with the call to generate the search query, to specifically show the model that you do not want "list" and "table" included.
@pamelafox yes, that was my first idea :) I also tried to adjust the prompt query_prompt_template with something like "ignore 'as table' and 'as a list'", "ignore all formatting requests like 'as a table'" and so on, but it didn't change anything
Huh. One more idea on the prompt engineering side: change search_sources function to also take a "format" parameter, and suggest it to put "list" and "table" there.
@mattgotteiner might have other ideas on the search index side, perhaps using custom analyzers (https://learn.microsoft.com/en-us/azure/search/search-analyzers) to strip out such directions. It'd be nice to figure out an approach that'd work well for everyone.