graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

[Bug] Auto Prompt Tuning - Where is entity_extraction.txt?

Open Silence-Well opened this issue 7 months ago • 1 comments

Do you need to file an issue?

  • [x] I have searched the existing issues and this bug is not already filed.
  • [x] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • [x] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

When using the prompt_tune mode and without any errors occurring, three files were obtained.

poetry run poe prompt_tune --root /ragtest/ --config /ragtest/settings.yaml --domain "xxx" --selection-method all
tree
.
├── community_report_graph.txt
├── extract_graph.txt
└── summarize_descriptions.txt

However, the entity_extraction.txt file as shown in the document does not exist.

Image

I noticed that the extract_graph option uses extract_graph.txt. I attempted to make the replacement, but encountered an error during the indexing process.

extract_graph:
  model_id: default_chat_model
  prompt: "prompts/extract_graph.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1
Image

indexing-engine.log

Traceback (most recent call last):
  File "/graphrag/graphrag/index/run/run_pipeline.py", line 129, in _run_pipeline
    result = await workflow_function(config, context)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/graphrag/graphrag/index/workflows/extract_graph.py", line 46, in run_workflow
    entities, relationships, raw_entities, raw_relationships = await extract_graph(
                                                               ^^^^^^^^^^^^^^^^^^^^
  File "/graphrag/graphrag/index/workflows/extract_graph.py", line 88, in extract_graph
    extracted_entities, extracted_relationships = await extractor(
                                                  ^^^^^^^^^^^^^^^^
  File "/graphrag/graphrag/index/operations/extract_graph/extract_graph.py", line 83, in extract_graph
    entities = _merge_entities(entity_dfs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/graphrag/graphrag/index/operations/extract_graph/extract_graph.py", line 107, in _merge_entities
    all_entities.groupby(["title", "type"], sort=False)
  File "/graphrag_source/lib/python3.12/site-packages/pandas/core/frame.py", line 9183, in groupby
    return DataFrameGroupBy(
           ^^^^^^^^^^^^^^^^^
  File "/graphrag_source/lib/python3.12/site-packages/pandas/core/groupby/groupby.py", line 1329, in __init__
    grouper, exclusions, obj = get_grouper(
                               ^^^^^^^^^^^^
  File "/graphrag_source/lib/python3.12/site-packages/pandas/core/groupby/grouper.py", line 1043, in get_grouper
    raise KeyError(gpr)
KeyError: 'title'

I tried using entity_extraction and commented out extract_graph, and it worked normally. However, I'm not sure if this is correct.

Steps to reproduce

No response

Expected Behavior

No response

GraphRAG Config Used

# Paste your config here

Logs and screenshots

No response

Additional Information

  • GraphRAG Version: 2.2.1
  • Operating System: Mac M
  • Python Version: Python 3.12
  • Related Issues:

Silence-Well avatar May 10 '25 14:05 Silence-Well

the same but I have extract_graph.txt I try to change its content and see this bug if slove this problem please tell me thank you

wcx2333 avatar May 22 '25 02:05 wcx2333

The docs are bit outdated, extract_graph.txt is your entity_extraction prompt. you need to edit extract_graph in settings.yaml:

gona-sreelatha avatar Jun 10 '25 08:06 gona-sreelatha

The same issue persists. After specifying the domain and generating a new extract_graph.txt file, the obtained entities are entity_types: [character, emotion, relationship, setting].

Therefore: extract_graph: model_id: default_chat_model #prompt: "prompts/extract_graph.txt" prompt: "prompt-paper/extract_graph.txt" #entity_types: [organization,person,geo,event] entity_types: [character, emotion, relationship, setting] max_gleanings: 1

Should I configure it this way? Even with this configuration, it still throws the error KeyError: 'title'.

Climber848 avatar Aug 11 '25 16:08 Climber848