azure-search-openai-demo IT's not working properly with larger data and sometime gives wrong answer with wrong citation. Anybody faced the same issue. We have 40 pdfs indexed

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

azd version?

run azd version and copy paste here.

Versions

Mention any other details that might be useful

Thanks! We'll be in touch soon.

Oct 30 '23 16:10 sunnygambit1989

@pamelafox Any suggestions?

Oct 30 '23 16:10 sunnygambit1989

Hi @sunnygambit1989 - My general recommendation is to look at the thought process to determine what aspect of the process went awry. Remember that there are multiple steps for /chat:

Turn user question into search query using ChatGPT
Search Azure Cognitive Search using that search query
Send user question and search results to ChatGPT

So what you want to determine is whether Azure Cognitive Search found the wrong results, or whether ChatGPT came up with the wrong answer based on right results. Or if something went wrong in the indexing step.

Oct 30 '23 16:10 pamelafox

@pamelafox Azure cognitive search found the wrong result.

Oct 30 '23 16:10 sunnygambit1989

Okay, in that case, the next step is usually to try different configurations, like just text search, just vector search, semantic ranker disabled, etc. I don't know what configuration you're currently using. Generally, hybrid + semantic ranker is the best overall combo, according to research the search team did, but I've seen some situations where it didn't work. For example, if you're querying for an exact title, or an exact number, then vector search may not be a good fit.

Oct 30 '23 16:10 pamelafox

@pamelafox I haven't changed anything it's default search i.e hybrid + sematic search. Searching how to create a CNAME record?

Oct 30 '23 16:10 sunnygambit1989

I'm not sure why it's having a hard time finding that document, but I've written up more tips for debugging search results here:

https://github.com/Azure-Samples/azure-search-openai-demo/pull/889/files#diff-4795f0c47daaacdcc54b5ec64b46df2c416b04abcbba2ef60590ad23ae025a5aR51

Please read through those tips and see if they can help you, like running queries in the search explorer in the Portal.

If those don't help, please send or attach the actual document so I can try for myself.

Oct 31 '23 19:10 pamelafox

@pamelafox We discovered the problem yesterday. Azure Cognitive Search is providing contextual documents as results. For example, when I searched, "How to create a CNAME record?" CNAME is related to domains, so Cognitive Search returns results that contain information about domains, regardless of whether it is specifically related to CNAME or not.

Nov 01 '23 04:11 sunnygambit1989

Do you get better results for that query when you turn off vectors? I'd expect that similarity to occur in vector space but not text matching. I'm surprised the CNAME document isn't getting ranked higher though. CC @srbalakr from search team

Nov 01 '23 04:11 pamelafox

@pamelafox The CNAME document is getting ranked higher, but a problem arises when I don't have access to the CNAME document and I ask about it. Therefore, we expect it to respond with "I don't know." Instead it returns some other documents with citation.

Let me try turning off vectors.

Nov 01 '23 04:11 sunnygambit1989

Hi @pamelafox , @sunnygambit1989 - I am facing a similar issue. The search results are a lot better and more relevant when I use the 'Text' search on the front end instead of 'Vectors'.

We just went through an exercise to remove vector embeddings from our datasets. How can I turn off the 'Vectors + Text (Hybrid)' retrieval method and make 'Text' the default for all end users on the backend? We have customized the frontend of the application so I would like make 'Text' retrieval method as default on the backend.

Nov 14 '23 15:11 en-coded

@pamelafox Thanks for the response Is there any PR for this?

Nov 14 '23 16:11 sunnygambit1989

There are generally multiple ways to override defaults. You can do it in the frontend by changing the default of the dropdowns, like:

const [retrievalMode, setRetrievalMode] = useState<RetrievalMode>(RetrievalMode.Text);

But you may want to just delete that dropdown entirely.

In the backend, you can specify it in the overrides, like I do in this branch:

https://github.com/Azure-Samples/azure-search-openai-demo/compare/main...pamelafox:azure-search-openai-demo:justtext?expand=1

And for data ingestion, you can pass in --novectors to avoid adding vector embeddings to the index.

Nov 14 '23 23:11 pamelafox

Sent PR to document: https://github.com/Azure-Samples/azure-search-openai-demo/pull/955

Nov 14 '23 23:11 pamelafox

Hi @pamelafox,

Regarding the data ingestion option, do I specify --novectors in the indexers? Would you please elaborate on it?

I see that in the PR you have only modified chatreadretrieveread.py to include the statement 'overrides["retrieval_mode"] = "text"'. Do I need to include the same statement in retrievethenread.py?

Thank you very much!

Nov 15 '23 02:11 en-coded

@en-coded It'd be an additional argument in prepdocs.sh or prepdocs.ps1, I've added it to the branch: https://github.com/Azure-Samples/azure-search-openai-demo/compare/main...pamelafox:azure-search-openai-demo:justtext?expand=1

And yes, same in retrievethenread, I've also added that to the branch.

Nov 15 '23 06:11 pamelafox

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.

Jan 23 '24 01:01 github-actions[bot]

azure-search-openai-demo azure-search-openai-demo copied to clipboard

IT's not working properly with larger data and sometime gives wrong answer with wrong citation. Anybody faced the same issue. We have 40 pdfs indexed

Please provide us with the following information:

This issue is for a: (mark with an x)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

azd version?

Versions

Mention any other details that might be useful

azure-search-openai-demo
azure-search-openai-demo copied to clipboard

This issue is for a: (mark with an `x`)