azure-search-openai-demo icon indicating copy to clipboard operation
azure-search-openai-demo copied to clipboard

Relevant text from the source document is being removed from the answer and used as a citation if it is in square brackets

Open elhele opened this issue 2 years ago • 6 comments
trafficstars

Please provide us with the following information:

This issue is for a: (mark with an x)

- [x] bug report 
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Load a document that contains text in square brackets, e.g. "[km / h ]"

Any log messages given by the failure

There are no errors in the logs, because it works as it supposed to work according to the code, if I understand it correctly: "Do not include any text inside [] or <<>> in the search query terms. ".

Expected/desired behavior

The relevant text is inside the answer and not the citations.

OS and Version?

macOS (Sonoma, Version 14.0)

azd version?

azd version 1.3.1

Versions

Repo version from 25.09 with fixed werkzeug-requirements

Mention any other details that might be useful

Please, see the screenshots


Head of the table from the original document: Bildschirmfoto 2023-10-05 um 09 53 09

Search result: Bildschirmfoto 2023-10-05 um 09 53 44

elhele avatar Oct 05 '23 08:10 elhele

Here is a fix that works for me.

Change the chatreadretrieveread.py prompt to include [[info1.txt]][[info2.pdf]] instead of just [info1.txt][info2.pdf] or any kind of additional structure that is less likely format than regular response.

Then change the AnswerParser.tsx const parts = parsedAnswer.split(/\[\[([^\]]+)\]\]/g); to now format with the structure you defined in the prompt. In my case it's '[[]]'

igforce avatar Oct 05 '23 11:10 igforce

@igforce I also tried it like this, but somehow I had problems with "Suggest follow-up questions" functionality in the answer that contained the square brackets. I though, may be it also creates problems with this part in AnswerParser.tsx and these indices should be adjusted:

if (isStreaming){ let lastIndex = parsedAnswer.length; for (let i = parsedAnswer.length - 1; i >= 0; i--) { if (parsedAnswer[i] === ']') { break; } else if (parsedAnswer[i] === '[') { lastIndex = i; break; } } const truncatedAnswer = parsedAnswer.substring(0, lastIndex); parsedAnswer = truncatedAnswer; }

elhele avatar Oct 05 '23 13:10 elhele

Hm. We could also make the regex check for filename format, something like \w+.\w+

pamelafox avatar Oct 09 '23 15:10 pamelafox

@pamelafox it would probably help. It can also be customised and set for example only \w+.pdf by individual users that clone the repo. Thank you for the reply!

elhele avatar Oct 10 '23 13:10 elhele

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.

github-actions[bot] avatar Dec 10 '23 01:12 github-actions[bot]

Merged https://github.com/Azure-Samples/azure-search-openai-demo/pull/2056 whoch should fix this

pamelafox avatar Oct 23 '24 18:10 pamelafox