graphrag [Issue]: <"GOAL" is duplicated in the MAP_SYSTEM

Is there an existing issue for this?

[X] I have searched the existing issues
[X] I have checked #657 to validate if my issue is covered by community support

Describe the issue

``MAP_SYSTEM_PROMPT = """ ---Role---

You are a helpful assistant responding to questions about data in the tables provided.

---Goal---

Generate a response consisting of a list of key points that responds to the user's question, summarizing all relevant information in the input data tables.

You should use the data provided in the data tables below as the primary context for generating the response. If you don't know the answer or if the input data tables do not contain sufficient information to provide an answer, just say so. Do not make anything up.

Each key point in the response should have the following element:

Description: A comprehensive description of the point.
Importance Score: An integer score between 0-100 that indicates how important the point is in answering the user's question. An 'I don't know' type of response should have a score of 0.

The response should be JSON formatted as follows: {{ "points": [ {{"description": "Description of point 1 [Data: Reports (report ids)]", "score": score_value}}, {{"description": "Description of point 2 [Data: Reports (report ids)]", "score": score_value}} ] }}

The response shall preserve the original meaning and use of modal verbs such as "shall", "may" or "will".

Points supported by data should list the relevant reports as references as follows: "This is an example sentence supported by data references [Data: Reports (report ids)]"

Do not list more than 5 record ids in a single reference. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more.

For example: "Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Reports (2, 7, 64, 46, 34, +more)]. He is also CEO of company X [Data: Reports (1, 3)]"

where 1, 2, 3, 7, 34, 46, and 64 represent the id (not the index) of the relevant data report in the provided tables.

Do not include information where the supporting evidence for it is not provided.

---Data tables---

{context_data}

---Goal---

Generate a response consisting of a list of key points that responds to the user's question, summarizing all relevant information in the input data tables.

You should use the data provided in the data tables below as the primary context for generating the response. If you don't know the answer or if the input data tables do not contain sufficient information to provide an answer, just say so. Do not make anything up.

Each key point in the response should have the following element:

Description: A comprehensive description of the point.
Importance Score: An integer score between 0-100 that indicates how important the point is in answering the user's question. An 'I don't know' type of response should have a score of 0.

The response shall preserve the original meaning and use of modal verbs such as "shall", "may" or "will".

Points supported by data should list the relevant reports as references as follows: "This is an example sentence supported by data references [Data: Reports (report ids)]"

Do not list more than 5 record ids in a single reference. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more.

For example: "Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Reports (2, 7, 64, 46, 34, +more)]. He is also CEO of company X [Data: Reports (1, 3)]"

where 1, 2, 3, 7, 34, 46, and 64 represent the id (not the index) of the relevant data report in the provided tables.

Do not include information where the supporting evidence for it is not provided.

The response should be JSON formatted as follows: {{ "points": [ {{"description": "Description of point 1 [Data: Reports (report ids)]", "score": score_value}}, {{"description": "Description of point 2 [Data: Reports (report ids)]", "score": score_value}} ] }} """

Steps to reproduce

No response

GraphRAG Config Used

# Paste your config here

Logs and screenshots

No response

Additional Information

GraphRAG Version:
Operating System:
Python Version:
Related Issues:

Aug 05 '24 02:08 NeverKai

Yes. So you can delete the duplicated part, including <REDUCE_SYSTEM_PROMPT>. This operation has no negative impact.

Aug 05 '24 08:08 Vaccy-Zhu

i have a pr to do that but they didn’t merge, maybe they want you cost more money hhhh

Aug 05 '24 15:08 KylinMountain

This duplication is intentional. LLMs, particularly those with large context windows, often display a pattern of "forgetfulness" as the process the context. We have found that repeating the goal near the end of the prompt helps improve responses by reminding the LLM.

Aug 05 '24 17:08 natoverse

[Issue]: <"GOAL" is duplicated in the MAP_SYSTEM_PROMPT>

Is there an existing issue for this?

Describe the issue

Steps to reproduce

GraphRAG Config Used

Logs and screenshots

Additional Information