obsidian-copilot icon indicating copy to clipboard operation
obsidian-copilot copied to clipboard

[FR] New mode: memory model

Open wwjCMP opened this issue 1 year ago • 3 comments

  1. 通过增加一个点赞按钮,让用户积累有价值的,回答更好的问题对。
  2. 将这些新的问题对积累到新的文件夹中(或文件中)。并且可以对问题对进行再修改
  3. 结合现有的一些LLM记忆组件,比如mem0ai。或者简单将这些问题对作为问答的具有高可信度的前置知识。
  4. 当然这部分问题对也是需要结合提问做相应的向量匹配。
  5. 这样可以保证对相关问题的回答逐渐精准。但由于用户可能需要新的启发,所以应该将这个作为一个独立的新模式。

  1. By adding a "like" button, users can accumulate valuable question-answer pairs that lead to better responses.
  2. These new question-answer pairs can be accumulated in a new folder (or file) and modified as needed.
  3. This can be integrated with existing LLM memory components, such as mem0ai, or simply used as high-credibility prior knowledge for the current question-answering session.
  4. Of course, these question-answer pairs should also be vector-matched with the question asked.
  5. This ensures that answers to related questions become increasingly accurate. However, since users may need new inspiration, this should be implemented as a separate new mode.

wwjCMP avatar Sep 16 '24 00:09 wwjCMP

hmm, the thumbsup button from ChatGPT and others mostly works as a positive signal labeled by the users to serve as fine-tuning data for the next model iteration. So in that form, it is not relevant to this client-side application.

From what you describe it seems you want to keep some QA pairs as "good" examples for future prompting. I'm not sure how that's going to work since QA pairs can be kinda random and belong to numerous topics. Then how do you organize them by topic? I can't think of a way to directly use them. Perhaps there's an example somewhere of how others are using this approach?

logancyang avatar Sep 16 '24 15:09 logancyang

hmm, the thumbsup button from ChatGPT and others mostly works as a positive signal labeled by the users to serve as fine-tuning data for the next model iteration. So in that form, it is not relevant to this client-side application.

From what you describe it seems you want to keep some QA pairs as "good" examples for future prompting. I'm not sure how that's going to work since QA pairs can be kinda random and belong to numerous topics. Then how do you organize them by topic? I can't think of a way to directly use them. Perhaps there's an example somewhere of how others are using this approach?

首先这个模式应该是vault QA的增强,所以它要建立在vault QA的基础上。 我觉得作为一种特殊的上下文是一种最为直接的方式。并且这个上下文是作用于整个思维链的全部。即作为额外的向量数据库被匹配。当然这时候相似度可以设置得高一点。 因为这部分内容是经过人工筛选的,默认是高质量的数据,记为A。其他的属于一般质量数据库,记为B。 比如根据提问C,首先匹配数据库A,等到片段A-1,回答为C-1。根据提问C和回答C-1,再去匹配数据库B,得到片段B-1。 这时候根据提问C,片段A-1,片段B-1,生成最终的回答。

以上,是我不清楚整个vault QA 中思维链运行过程的前提下的一个假想过程。 核心有以下几点:

  1. 数据分成两部分A,B。A就是我前面说的在平时问答过程中人工筛选的数据。B就是常规的obsidian的笔记数据。
  2. 如果问题在数据A中能匹配到高相似的要将匹配到的这部分知识权重提高。
  3. 并在这个过程之后再去匹配数据B。并且在最后一步中一定要包含数据A中匹配到的A-1。

动机是,当我们多次提问类似的问题或相关问题时,我们保证首先匹配到以前我们在提问中人工筛选的高质量知识。


First, this model should be an enhancement of Vault QA, so it should be built on the basis of Vault QA.

I think using a special context is the most direct way. This context should apply to the entire chain of thought, acting as an additional vector database to be matched. At this point, the similarity can be set higher.

Since this part of the content is manually filtered and considered high-quality data, let's call it A. The other part belongs to the general quality database, called B.

For example, based on question C, first match database A to get fragment A-1, and the answer is C-1. Based on question C and answer C-1, then match database B to get fragment B-1.

At this point, based on question C, fragment A-1, and fragment B-1, generate the final answer.

The above is a hypothetical process under the premise that I am not clear about the entire chain of thought operation process in Vault QA mode.

The core points are as follows:

  1. Data is divided into two parts, A and B. A is the data manually filtered during the usual Q&A process. B is the regular Obsidian note data.
  2. If a question can be matched to high similarity in data A, the weight of this matched knowledge should be increased.
  3. After this process, match data B.
  4. In the final step, the matched A-1 from data A must be included.

The motivation is that when we ask similar or related questions multiple times, we ensure that we first match the high-quality knowledge manually filtered in previous questions.

wwjCMP avatar Sep 16 '24 17:09 wwjCMP

以上实际上是非常庸俗的用法。目前一些程序在致力于挂载外部记忆的功能,从而在避免再次训练的情况下对模型的输出做一定的定制化。


The above usage is actually quite mundane. Some programs are currently working on incorporating external memory capabilities, which allows for customization of model outputs without retraining.

wwjCMP avatar Sep 17 '24 00:09 wwjCMP