azure-search-openai-demo icon indicating copy to clipboard operation
azure-search-openai-demo copied to clipboard

Formatting request decreases the query-quality

Open elhele opened this issue 1 year ago • 5 comments

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ x ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

When asking questions related to the documents, it's noticed that adding formatting requests like "as a list" or "as a table in English" negatively affects search outcomes. For instance, posing a question about "quality management and improvement procedures" produces better results compared to "quality management and improvement procedures as a table with a numbered list". This leads to significantly lower rankings and a different order of results in Azure Cognitive Search, which distorts the result.

Examples:

quality management and improvement procedures "Quality management" AND "improvement procedures" rerankerScore of the most relevant document: 3.495

quality management and improvement procedures as a table "quality management" "improvement procedures" table rerankerScore of the most relevant document: 2.042

quality management and improvement procedures as a table with a numbered list in english "Quality management improvement procedures table" rerankerScore of the most relevant document: 1.933

Any log messages given by the failure

N/A

Expected/desired behavior

The generated queries do not depend on any formatting or language requests in the question text, formatting is only important for the answer generation in "Step 3" and not for the query generation in "Step 1". It would be great if it was possible to adjust query_prompt_template in such a way that it'd focus on the factual part and ignore the formatting part.

OS and Version?

macOS Sonoma Version 14.2.1 (23C71)

azd version?

azd version 1.5.0

Versions

latest

Mention any other details that might be useful


Thanks! We'll be in touch soon.

elhele avatar Jan 10 '24 16:01 elhele

Thanks for the feedback.

Can you open the developer settings tab and experiment with the following options to see if your results improve?

  1. Text only search
  2. Vector only search
  3. Hybrid search
  4. Semantic ranking enabled / disabled

Also, do you get different results on "chat" vs. "ask a question" modes? Please let me know.

mattgotteiner avatar Jan 12 '24 18:01 mattgotteiner

Hello @mattgotteiner, sorry for the delayed answer.

Unfortunately, I could not see much of a difference after those changes. I've omitted "Text only search" though, because in this case the results are usually much worse and I'm getting "None" in doc['@search.reranker_score'], which I need for my project

elhele avatar Jan 30 '24 09:01 elhele

Matt might have other suggestions as well, but I'm curious if you've tried modifying the few-shot examples that are sent with the call to generate the search query, to specifically show the model that you do not want "list" and "table" included.

pamelafox avatar Jan 30 '24 22:01 pamelafox

@pamelafox yes, that was my first idea :) I also tried to adjust the prompt query_prompt_template with something like "ignore 'as table' and 'as a list'", "ignore all formatting requests like 'as a table'" and so on, but it didn't change anything

elhele avatar Jan 31 '24 08:01 elhele

Huh. One more idea on the prompt engineering side: change search_sources function to also take a "format" parameter, and suggest it to put "list" and "table" there.

@mattgotteiner might have other ideas on the search index side, perhaps using custom analyzers (https://learn.microsoft.com/en-us/azure/search/search-analyzers) to strip out such directions. It'd be nice to figure out an approach that'd work well for everyone.

pamelafox avatar Feb 01 '24 00:02 pamelafox