azure-search-openai-demo icon indicating copy to clipboard operation
azure-search-openai-demo copied to clipboard

Cognitive search randomly returns no results

Open d-paulus opened this issue 1 year ago • 14 comments

Please provide us with the following information:

This issue is for a: (mark with an x)

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Randomly ask questions (and follow up questions) that return results when using Cognitive Search portal. In the app, the Cognitive Search client sometimes finds/returns hits, sometimes not, it appears to be random. The index is correct and using the same queries in Cognitive Search portal returns results.

Any log messages given by the failure

content = "\n".join(results) # is empty

Expected/desired behavior

content = "\n".join(results) # holds the items returned from Cognitive Search index

OS and Version?

Windows 10, Linux

azd version?

1.2

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

d-paulus avatar Aug 13 '23 13:08 d-paulus

Hi @d-paulus - I assume this is using the /chat endpoint? The first step of that endpoint uses a ChatGPT model to generate a query based on the user's question. I'm guessing that the query is changing each time, and that some of those queries are better than others. Can you log out the queries and confirm that the issue is the query generation?

It's this code in chatreadretrieveread.py:

        chat_completion = await openai.ChatCompletion.acreate(
            deployment_id=self.chatgpt_deployment,
            model=self.chatgpt_model,
            messages=messages,
            temperature=0.0,
            max_tokens=32,
            n=1)

        query_text = chat_completion.choices[0].message.content

pamelafox avatar Aug 15 '23 17:08 pamelafox

We printed the queries and they are valid, meaning when we use them in Cog Search portal, using the same index and api_version, we get results.

Yes, using /chat and chatreadretrieveread.py.

Can you confirm await self.search_client.search(query_text,...) always returns results when the query is valid, and index name and api_version match, especially for follow up questions?

d-paulus avatar Aug 17 '23 12:08 d-paulus

Interesting, so for the same inputs, you're getting different results from the Cognitive Search call? I'm not an ACS expert so I'll CC @srbalakr from the ACS team to see if he has ideas.

pamelafox avatar Aug 17 '23 21:08 pamelafox

hi @d-paulus its unlikely to occur, by default in app we have enabled semantic ranking and hybrid search (vector + text). Do you use default search option or vector only or text only ? Can you try different options to narrow down the root cause ?

  1. Could you also check if there were any updates to the index ?
  2. For empty results, was the response code 200 ?

srbalakr avatar Aug 17 '23 22:08 srbalakr

I am also experiencing something similar.

@srbalakr , @pamelafox

  1. Search results that appear as the top results on the 'Search Explorer' of the gptkbindex do not appear on the frontend application.
  2. I have been experimenting with hybrid and vector-only options, but the search results on the front end are different from what's displayed on the search explorer.

I've had little success tweaking scoring profiles and changing index refresh frequency and semantic configurations for better results.

en-coded avatar Aug 20 '23 16:08 en-coded

Run via web app, from blobs in storage account a search for a single word e.g. test does not yield results. Ask for the same word but with initial letter capitalized i.e. Test and a response from blob document is usually, but not always returned. Ask a fuller question that more closely aligns with blob contents e.g. what is a test? And you will get an answer. Can anyone advise what is happening in regard to capitalisation?

IainD925 avatar Aug 21 '23 11:08 IainD925

I am also experiencing something similar.

@srbalakr , @pamelafox

  1. Search results that appear as the top results on the 'Search Explorer' of the gptkbindex do not appear on the frontend application.
  2. I have been experimenting with hybrid and vector-only options, but the search results on the front end are different from what's displayed on the search explorer.

I've had little success tweaking scoring profiles and changing index refresh frequency and semantic configurations for better results.

Hi, do you find this issue in chat feature or ask feature ? If in chat feature, there is a query generation by GPT in RAG paradigm which sometimes error out. Could you check the 'thought process' icon in UI if query generated fits the original query ?

srbalakr avatar Aug 23 '23 18:08 srbalakr

Run via web app, from blobs in storage account a search for a single word e.g. test does not yield results. Ask for the same word but with initial letter capitalized i.e. Test and a response from blob document is usually, but not always returned. Ask a fuller question that more closely aligns with blob contents e.g. what is a test? And you will get an answer. Can anyone advise what is happening in regard to capitalisation?

hi, analyzers by default are case insensitive. Do you have any custom analyer ? Also similar to above, do you use chat approach or ask approach. Could you check the query generated by gpt for chat approach ?

srbalakr avatar Aug 23 '23 18:08 srbalakr

Hi @srbalakr, we're not using a custom analyzer. We've set a simple system message "you are a HR professional" which appears to follow the 'ask' approach as it will not reply with a piece of user provided information such as the company I work for, if asked later in the conversation.

Interestingly I've noticed that the app will wander off the document store content if a previous answer contained a word you want to use. Example response to a valid question "...should be a safe and secure environment" - as per documentation.

To which I ask "What are the biggest risks to an IT professional in London?" - the documentation has zero references to London or cyber attacks but responds:

"I'm sorry, but the retrieved documents do not provide information on the biggest risks to an IT professional in London. However, some potential risks to consider could include cyber attacks, data breaches, and physical security threats. It is important to assess the specific risks associated with the employee's role and the nature of their visit, and take appropriate steps to mitigate those risks. This could include providing the employee with training and support, ensuring that their devices and data are secure, and arranging for appropriate security measures such as secure transportation and accommodations."

Clearly there's a balance against being helpful vs coming up with content that may not comply with company policy.

IainD925 avatar Aug 30 '23 15:08 IainD925

In some cases, no results are returned from CogSearch because the query is put in quotation marks, or the query contains multiple parts, some of them enclosed in quotation marks.

d-paulus avatar Sep 07 '23 12:09 d-paulus

Are you using the latest code from main? The thought process looks like you might be using outdated code. We've made some improvements.

pamelafox avatar Sep 14 '23 19:09 pamelafox

Are you using the latest code from main? The thought process looks like you might be using outdated code. We've made some improvements.

I'll try that, thanks! I had to delete my question with the screenshots as I realized it might reveal some internal proprietary information.

juichiache avatar Sep 14 '23 19:09 juichiache

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.

github-actions[bot] avatar Nov 14 '23 01:11 github-actions[bot]

Hi there - I have a very similar situation and was wondering if there was any resolution to this challenge? Can you please let me know? Thanks!

alpkayaMSFT avatar Mar 07 '24 23:03 alpkayaMSFT