azure-search-openai-demo
azure-search-openai-demo copied to clipboard
Cognitive search randomly returns no results
Please provide us with the following information:
This issue is for a: (mark with an x
)
- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
Randomly ask questions (and follow up questions) that return results when using Cognitive Search portal. In the app, the Cognitive Search client sometimes finds/returns hits, sometimes not, it appears to be random. The index is correct and using the same queries in Cognitive Search portal returns results.
Any log messages given by the failure
content = "\n".join(results) # is empty
Expected/desired behavior
content = "\n".join(results) # holds the items returned from Cognitive Search index
OS and Version?
Windows 10, Linux
azd version?
1.2
Versions
Mention any other details that might be useful
Thanks! We'll be in touch soon.
Hi @d-paulus - I assume this is using the /chat endpoint? The first step of that endpoint uses a ChatGPT model to generate a query based on the user's question. I'm guessing that the query is changing each time, and that some of those queries are better than others. Can you log out the queries and confirm that the issue is the query generation?
It's this code in chatreadretrieveread.py:
chat_completion = await openai.ChatCompletion.acreate(
deployment_id=self.chatgpt_deployment,
model=self.chatgpt_model,
messages=messages,
temperature=0.0,
max_tokens=32,
n=1)
query_text = chat_completion.choices[0].message.content
We printed the queries and they are valid, meaning when we use them in Cog Search portal, using the same index and api_version, we get results.
Yes, using /chat
and chatreadretrieveread.py
.
Can you confirm await self.search_client.search(query_text,...)
always returns results when the query is valid, and index name and api_version match, especially for follow up questions?
Interesting, so for the same inputs, you're getting different results from the Cognitive Search call? I'm not an ACS expert so I'll CC @srbalakr from the ACS team to see if he has ideas.
hi @d-paulus its unlikely to occur, by default in app we have enabled semantic ranking and hybrid search (vector + text). Do you use default search option or vector only or text only ? Can you try different options to narrow down the root cause ?
- Could you also check if there were any updates to the index ?
- For empty results, was the response code 200 ?
I am also experiencing something similar.
@srbalakr , @pamelafox
- Search results that appear as the top results on the 'Search Explorer' of the gptkbindex do not appear on the frontend application.
- I have been experimenting with hybrid and vector-only options, but the search results on the front end are different from what's displayed on the search explorer.
I've had little success tweaking scoring profiles and changing index refresh frequency and semantic configurations for better results.
Run via web app, from blobs in storage account a search for a single word e.g. test does not yield results. Ask for the same word but with initial letter capitalized i.e. Test and a response from blob document is usually, but not always returned. Ask a fuller question that more closely aligns with blob contents e.g. what is a test? And you will get an answer. Can anyone advise what is happening in regard to capitalisation?
I am also experiencing something similar.
@srbalakr , @pamelafox
- Search results that appear as the top results on the 'Search Explorer' of the gptkbindex do not appear on the frontend application.
- I have been experimenting with hybrid and vector-only options, but the search results on the front end are different from what's displayed on the search explorer.
I've had little success tweaking scoring profiles and changing index refresh frequency and semantic configurations for better results.
Hi, do you find this issue in chat feature or ask feature ? If in chat feature, there is a query generation by GPT in RAG paradigm which sometimes error out. Could you check the 'thought process' icon in UI if query generated fits the original query ?
Run via web app, from blobs in storage account a search for a single word e.g. test does not yield results. Ask for the same word but with initial letter capitalized i.e. Test and a response from blob document is usually, but not always returned. Ask a fuller question that more closely aligns with blob contents e.g. what is a test? And you will get an answer. Can anyone advise what is happening in regard to capitalisation?
hi, analyzers by default are case insensitive. Do you have any custom analyer ? Also similar to above, do you use chat approach or ask approach. Could you check the query generated by gpt for chat approach ?
Hi @srbalakr, we're not using a custom analyzer. We've set a simple system message "you are a HR professional" which appears to follow the 'ask' approach as it will not reply with a piece of user provided information such as the company I work for, if asked later in the conversation.
Interestingly I've noticed that the app will wander off the document store content if a previous answer contained a word you want to use. Example response to a valid question "...should be a safe and secure environment" - as per documentation.
To which I ask "What are the biggest risks to an IT professional in London?" - the documentation has zero references to London or cyber attacks but responds:
"I'm sorry, but the retrieved documents do not provide information on the biggest risks to an IT professional in London. However, some potential risks to consider could include cyber attacks, data breaches, and physical security threats. It is important to assess the specific risks associated with the employee's role and the nature of their visit, and take appropriate steps to mitigate those risks. This could include providing the employee with training and support, ensuring that their devices and data are secure, and arranging for appropriate security measures such as secure transportation and accommodations."
Clearly there's a balance against being helpful vs coming up with content that may not comply with company policy.
In some cases, no results are returned from CogSearch because the query is put in quotation marks, or the query contains multiple parts, some of them enclosed in quotation marks.
Are you using the latest code from main? The thought process looks like you might be using outdated code. We've made some improvements.
Are you using the latest code from main? The thought process looks like you might be using outdated code. We've made some improvements.
I'll try that, thanks! I had to delete my question with the screenshots as I realized it might reveal some internal proprietary information.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.
Hi there - I have a very similar situation and was wondering if there was any resolution to this challenge? Can you please let me know? Thanks!