azure-search-openai-demo Relevant text from the source document is being removed from the answer and used as a citation if it is in square brackets

Relevant text from the source document is being removed from the answer and used as a citation if it is in square brackets

Open elhele opened this issue 2 years ago • 6 comments

trafficstars

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [x] bug report 
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Load a document that contains text in square brackets, e.g. "[km / h ]"

Any log messages given by the failure

There are no errors in the logs, because it works as it supposed to work according to the code, if I understand it correctly: "Do not include any text inside [] or <<>> in the search query terms. ".

Expected/desired behavior

The relevant text is inside the answer and not the citations.

OS and Version?

macOS (Sonoma, Version 14.0)

azd version?

azd version 1.3.1

Versions

Repo version from 25.09 with fixed werkzeug-requirements

Mention any other details that might be useful

Please, see the screenshots

Head of the table from the original document:

Search result:

Oct 05 '23 08:10 elhele

Here is a fix that works for me.

Change the chatreadretrieveread.py prompt to include [[info1.txt]][[info2.pdf]] instead of just [info1.txt][info2.pdf] or any kind of additional structure that is less likely format than regular response.

Then change the AnswerParser.tsx const parts = parsedAnswer.split(/\[\[([^\]]+)\]\]/g); to now format with the structure you defined in the prompt. In my case it's '[[]]'

Oct 05 '23 11:10 igforce

@igforce I also tried it like this, but somehow I had problems with "Suggest follow-up questions" functionality in the answer that contained the square brackets. I though, may be it also creates problems with this part in AnswerParser.tsx and these indices should be adjusted:

if (isStreaming){ let lastIndex = parsedAnswer.length; for (let i = parsedAnswer.length - 1; i >= 0; i--) { if (parsedAnswer[i] === ']') { break; } else if (parsedAnswer[i] === '[') { lastIndex = i; break; } } const truncatedAnswer = parsedAnswer.substring(0, lastIndex); parsedAnswer = truncatedAnswer; }

Oct 05 '23 13:10 elhele

Hm. We could also make the regex check for filename format, something like \w+.\w+

Oct 09 '23 15:10 pamelafox

@pamelafox it would probably help. It can also be customised and set for example only \w+.pdf by individual users that clone the repo. Thank you for the reply!

Oct 10 '23 13:10 elhele

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.

Dec 10 '23 01:12 github-actions[bot]

Merged https://github.com/Azure-Samples/azure-search-openai-demo/pull/2056 whoch should fix this

Oct 23 '24 18:10 pamelafox

azure-search-openai-demo azure-search-openai-demo copied to clipboard

Relevant text from the source document is being removed from the answer and used as a citation if it is in square brackets

Please provide us with the following information:

This issue is for a: (mark with an x)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

azd version?

Versions

Mention any other details that might be useful

azure-search-openai-demo
azure-search-openai-demo copied to clipboard

This issue is for a: (mark with an `x`)