kernel-memory icon indicating copy to clipboard operation
kernel-memory copied to clipboard

[Question] does content.url in filename for websites make sense? (I want attribution per paragraph via separate prompt)

Open chaelli opened this issue 1 year ago • 2 comments

Context / Scenario

I changed the prompt to make sure the llm includes the source per paragraph of the answer. So I can more closly align the response with the facts for my users. When I do that, I can only tell it to reference the filename (as this is what the llm gets in the facts part of the prompt). For websites this is always "content.url" - because this is set so in https://github.com/microsoft/kernel-memory/blob/a1f280c42c4df9a60d1d5cecf0633d07ff927b1b/service/Core/MemoryService.cs#L120

Question

I wonder if it would not make more sense to put the url there instead of a static string. Or at least include the url in the facts where it exists.

chaelli avatar May 15 '24 06:05 chaelli

You should be able to swap content.url with the URL upon receiving the response, there is a property with the URL

dluc avatar May 15 '24 07:05 dluc

This only works if there is just 1 relevant source - if there are multiple, I would not know which part of the answer is based on what page. If there are multiple sources, they are all called content.url and I cannot align separate sources to separate paragraphs. fyi until I started using kernel memory, I just used a prompt like this:

Add a source reference to the end of each sentence. e.g. Apple is a fruit ([Reference page title](Reference page url)) (markdown link formatting). ...

chaelli avatar May 15 '24 07:05 chaelli

@dluc Do you have any preference between the options:

  • replace "content.url" during indexing with the real url value?
  • additing the url as an additional value in the prompt?

Or none of them?

chaelli avatar May 27 '24 07:05 chaelli

@dluc Do you have any preference between the options:

* replace "content.url" during indexing with the real url value?

* additing the url as an additional value in the prompt?

Or none of them?

I would try the approach with the prompt, it should be easier. Changing the indexing pipeline might have unexpected impact

dluc avatar May 27 '24 07:05 dluc