semantic-kernel Memory embeddings example does not work with multiple facts related to one subject

Describe the bug When adding additional facts around a subject (for example, what I do for work), the result will randomly pick a fact and use it as the response, instead of stating all the relevant facts.

To Reproduce Steps to reproduce the behavior:

Follow the example in 06-memory-and-embeddings.ipynb
Replace the facts with:

await kernel.Memory.SaveInformationAsync(MemoryCollectionName, id: "info1", text: "My name is Andrea");
await kernel.Memory.SaveInformationAsync(MemoryCollectionName, id: "info2", text: "I currently work as a tourist operator");
await kernel.Memory.SaveInformationAsync(MemoryCollectionName, id: "info3", text: "I also currently work as a bus driver");
await kernel.Memory.SaveInformationAsync(MemoryCollectionName, id: "info4", text: "I sometimes work as a teacher");

Just ask the question: "what do I do for work?", 4. The question returns "I sometimes work as a teacher".

Expected behavior A full listing of what I do for work.

Desktop (please complete the following information):

OS: Windows
IDE: Visual Studio 2022
NuGet Package Version: 0.13.277.1-preview

Additional context The issue extends towards other facts, and was found when trying to develop a chat bot that answers questions about gene-disease associations. When "taught" 1:M associations, I continually get just one of the answers as a result, which is incomplete.

May 04 '23 21:05 yarong-lifemap

I think you'll need to return say the top 3 vector results and send that to GPT for a completion to compile the results. I wouldn't say it doesn't "work", its just a proof of concept highlighting potential

May 05 '23 04:05 cchighman

Here's what I tried following your suggestion:

what are all the things I do for work? I sometimes work as a teacher what are the 3 things I do for work I sometimes work as a teacher

May 05 '23 08:05 yarong-lifemap

The example above is highlighting the most basic form of vector search and is only returning back a single result. For me, it helped conceptualize how vector search works in memory. It's definitely not intending to be robust and representative of more advanced use cases.

Just below that one though, it goes a step further, and I think this is where you find the solution to the question you were describing. It set a limit to return the top 5 nearest matches. If there is more than one vector that has subject matter around your question, there's a good chance it will return in those 5 results. I would send those top vectors as context over to GPT with the questions as a single completion. GPT then is capable of providing the reasoning and inference to understand you have three jobs. I have strong confidence you'd get the answer you're expecting this way.

May 05 '23 22:05 cchighman

Hi @cchighman. Thank you for the very detailed response. The concern I have is that I'm trying to feed a very large amount of information into the knowledgebase - in this specific case, associations of Gene-Disease. There are > 20,000 of those associations, and I'd like to be able to ask simple questions like: "How many diseases are associated with X", or "Which genes share an association with disease Y". Therefore, loading them all into context wouldn't work.

May 11 '23 10:05 yarong-lifemap

Late to respond here, so perhaps you've already solved?

With a large 1-many set, could you use the model to first work out the name of the gene or disease in the statement?

That would at least give a good starting point for your next step.

Prompt could look something like:

Based on the following input

{{$input}}

Return JSON in the following format:
 {"type":"disease|gene", "name":"name of the disease or gene", "confidence":value between 0 and 1 based on how confident you are with the response}

May 30 '23 23:05 craigomatic

Closing for now as I believe the original issue has been resolved, feel free to reopen if needed

Jun 10 '23 14:06 craigomatic