graphrag
graphrag copied to clipboard
[Bug]: The gleaning is *not* including the original input
Describe the bug
If you look at the below function:
https://github.com/microsoft/graphrag/blob/309abc982f158c38099c6098d30b35a20972d258/graphrag/index/graph/extractors/graph/graph_extractor.py#L148C5-L182C23
async def _process_document(
self, text: str, prompt_variables: dict[str, str]
) -> str:
response = await self._llm(
self._extraction_prompt,
variables={
**prompt_variables,
self._input_text_key: text,
},
)
results = response.output or ""
# Repeat to ensure we maximize entity count
for i in range(self._max_gleanings):
glean_response = await self._llm(
CONTINUE_PROMPT,
name=f"extract-continuation-{i}",
history=response.history or [],
)
results += glean_response.output or ""
# if this is the final glean, don't bother updating the continuation flag
if i >= self._max_gleanings - 1:
break
continuation = await self._llm(
LOOP_PROMPT,
name=f"extract-loopcheck-{i}",
history=glean_response.history or [],
model_parameters=self._loop_args,
)
if continuation.output != "YES":
break
return results
The call to do gleaning does not include the original input
glean_response = await self._llm(
CONTINUE_PROMPT,
name=f"extract-continuation-{i}",
history=response.history or [],
)
response.history only includes the last output from the LLM i.e. it is missing the GRAPH_EXTRACTION_PROMPT prompt.
I have verified this by looking at the exchange in the debugger as well.
Is this expected implementation? I would have thought that in order to glean at minimum the original text (chunk) would be required. Here the gleaning seems to only use the last response
Please guide. Thanks.
Steps to reproduce
No response
Expected Behavior
No response
GraphRAG Config Used
No response
Logs and screenshots
No response
Additional Information
- GraphRAG Version:
- Operating System:
- Python Version:
- Related Issues: