azure-search-openai-demo
azure-search-openai-demo copied to clipboard
IT's not working properly with larger data and sometime gives wrong answer with wrong citation. Anybody faced the same issue. We have 40 pdfs indexed
Please provide us with the following information:
This issue is for a: (mark with an x)
- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
Any log messages given by the failure
Expected/desired behavior
OS and Version?
Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)
azd version?
run
azd versionand copy paste here.
Versions
Mention any other details that might be useful
Thanks! We'll be in touch soon.
@pamelafox Any suggestions?
Hi @sunnygambit1989 - My general recommendation is to look at the thought process to determine what aspect of the process went awry. Remember that there are multiple steps for /chat:
- Turn user question into search query using ChatGPT
- Search Azure Cognitive Search using that search query
- Send user question and search results to ChatGPT
So what you want to determine is whether Azure Cognitive Search found the wrong results, or whether ChatGPT came up with the wrong answer based on right results. Or if something went wrong in the indexing step.
@pamelafox Azure cognitive search found the wrong result.
Okay, in that case, the next step is usually to try different configurations, like just text search, just vector search, semantic ranker disabled, etc. I don't know what configuration you're currently using. Generally, hybrid + semantic ranker is the best overall combo, according to research the search team did, but I've seen some situations where it didn't work. For example, if you're querying for an exact title, or an exact number, then vector search may not be a good fit.
@pamelafox I haven't changed anything it's default search i.e hybrid + sematic search.
Searching how to create a CNAME record?
I'm not sure why it's having a hard time finding that document, but I've written up more tips for debugging search results here:
https://github.com/Azure-Samples/azure-search-openai-demo/pull/889/files#diff-4795f0c47daaacdcc54b5ec64b46df2c416b04abcbba2ef60590ad23ae025a5aR51
Please read through those tips and see if they can help you, like running queries in the search explorer in the Portal.
If those don't help, please send or attach the actual document so I can try for myself.
@pamelafox We discovered the problem yesterday. Azure Cognitive Search is providing contextual documents as results. For example, when I searched, "How to create a CNAME record?" CNAME is related to domains, so Cognitive Search returns results that contain information about domains, regardless of whether it is specifically related to CNAME or not.
Do you get better results for that query when you turn off vectors? I'd expect that similarity to occur in vector space but not text matching. I'm surprised the CNAME document isn't getting ranked higher though. CC @srbalakr from search team
@pamelafox The CNAME document is getting ranked higher, but a problem arises when I don't have access to the CNAME document and I ask about it. Therefore, we expect it to respond with "I don't know." Instead it returns some other documents with citation.
Let me try turning off vectors.
Hi @pamelafox , @sunnygambit1989 - I am facing a similar issue. The search results are a lot better and more relevant when I use the 'Text' search on the front end instead of 'Vectors'.
We just went through an exercise to remove vector embeddings from our datasets. How can I turn off the 'Vectors + Text (Hybrid)' retrieval method and make 'Text' the default for all end users on the backend? We have customized the frontend of the application so I would like make 'Text' retrieval method as default on the backend.
@pamelafox Thanks for the response Is there any PR for this?
There are generally multiple ways to override defaults. You can do it in the frontend by changing the default of the dropdowns, like:
const [retrievalMode, setRetrievalMode] = useState<RetrievalMode>(RetrievalMode.Text);
But you may want to just delete that dropdown entirely.
In the backend, you can specify it in the overrides, like I do in this branch:
https://github.com/Azure-Samples/azure-search-openai-demo/compare/main...pamelafox:azure-search-openai-demo:justtext?expand=1
And for data ingestion, you can pass in --novectors to avoid adding vector embeddings to the index.
Sent PR to document: https://github.com/Azure-Samples/azure-search-openai-demo/pull/955
Hi @pamelafox,
Regarding the data ingestion option, do I specify --novectors in the indexers? Would you please elaborate on it?
I see that in the PR you have only modified chatreadretrieveread.py to include the statement 'overrides["retrieval_mode"] = "text"'. Do I need to include the same statement in retrievethenread.py?
Thank you very much!
@en-coded It'd be an additional argument in prepdocs.sh or prepdocs.ps1, I've added it to the branch: https://github.com/Azure-Samples/azure-search-openai-demo/compare/main...pamelafox:azure-search-openai-demo:justtext?expand=1
And yes, same in retrievethenread, I've also added that to the branch.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.