azure-search-openai-demo
azure-search-openai-demo copied to clipboard
Relevant text from the source document is being removed from the answer and used as a citation if it is in square brackets
Please provide us with the following information:
This issue is for a: (mark with an x)
- [x] bug report
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
Load a document that contains text in square brackets, e.g. "[km / h ]"
Any log messages given by the failure
There are no errors in the logs, because it works as it supposed to work according to the code, if I understand it correctly: "Do not include any text inside [] or <<>> in the search query terms. ".
Expected/desired behavior
The relevant text is inside the answer and not the citations.
OS and Version?
macOS (Sonoma, Version 14.0)
azd version?
azd version 1.3.1
Versions
Repo version from 25.09 with fixed werkzeug-requirements
Mention any other details that might be useful
Please, see the screenshots
Head of the table from the original document:
Search result:
Here is a fix that works for me.
Change the chatreadretrieveread.py prompt to include [[info1.txt]][[info2.pdf]] instead of just [info1.txt][info2.pdf] or any kind of additional structure that is less likely format than regular response.
Then change the AnswerParser.tsx const parts = parsedAnswer.split(/\[\[([^\]]+)\]\]/g); to now format with the structure you defined in the prompt. In my case it's '[[]]'
@igforce I also tried it like this, but somehow I had problems with "Suggest follow-up questions" functionality in the answer that contained the square brackets. I though, may be it also creates problems with this part in AnswerParser.tsx and these indices should be adjusted:
if (isStreaming){ let lastIndex = parsedAnswer.length; for (let i = parsedAnswer.length - 1; i >= 0; i--) { if (parsedAnswer[i] === ']') { break; } else if (parsedAnswer[i] === '[') { lastIndex = i; break; } } const truncatedAnswer = parsedAnswer.substring(0, lastIndex); parsedAnswer = truncatedAnswer; }
Hm. We could also make the regex check for filename format, something like \w+.\w+
@pamelafox it would probably help. It can also be customised and set for example only \w+.pdf by individual users that clone the repo. Thank you for the reply!
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.
Merged https://github.com/Azure-Samples/azure-search-openai-demo/pull/2056 whoch should fix this