[Bug] Auto Prompt Tuning - Where is entity_extraction.txt?
Do you need to file an issue?
- [x] I have searched the existing issues and this bug is not already filed.
- [x] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
- [x] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the bug
When using the prompt_tune mode and without any errors occurring, three files were obtained.
poetry run poe prompt_tune --root /ragtest/ --config /ragtest/settings.yaml --domain "xxx" --selection-method all
tree
.
├── community_report_graph.txt
├── extract_graph.txt
└── summarize_descriptions.txt
However, the entity_extraction.txt file as shown in the document does not exist.
I noticed that the extract_graph option uses extract_graph.txt. I attempted to make the replacement, but encountered an error during the indexing process.
extract_graph:
model_id: default_chat_model
prompt: "prompts/extract_graph.txt"
entity_types: [organization,person,geo,event]
max_gleanings: 1
indexing-engine.log
Traceback (most recent call last):
File "/graphrag/graphrag/index/run/run_pipeline.py", line 129, in _run_pipeline
result = await workflow_function(config, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/graphrag/graphrag/index/workflows/extract_graph.py", line 46, in run_workflow
entities, relationships, raw_entities, raw_relationships = await extract_graph(
^^^^^^^^^^^^^^^^^^^^
File "/graphrag/graphrag/index/workflows/extract_graph.py", line 88, in extract_graph
extracted_entities, extracted_relationships = await extractor(
^^^^^^^^^^^^^^^^
File "/graphrag/graphrag/index/operations/extract_graph/extract_graph.py", line 83, in extract_graph
entities = _merge_entities(entity_dfs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/graphrag/graphrag/index/operations/extract_graph/extract_graph.py", line 107, in _merge_entities
all_entities.groupby(["title", "type"], sort=False)
File "/graphrag_source/lib/python3.12/site-packages/pandas/core/frame.py", line 9183, in groupby
return DataFrameGroupBy(
^^^^^^^^^^^^^^^^^
File "/graphrag_source/lib/python3.12/site-packages/pandas/core/groupby/groupby.py", line 1329, in __init__
grouper, exclusions, obj = get_grouper(
^^^^^^^^^^^^
File "/graphrag_source/lib/python3.12/site-packages/pandas/core/groupby/grouper.py", line 1043, in get_grouper
raise KeyError(gpr)
KeyError: 'title'
I tried using entity_extraction and commented out extract_graph, and it worked normally. However, I'm not sure if this is correct.
Steps to reproduce
No response
Expected Behavior
No response
GraphRAG Config Used
# Paste your config here
Logs and screenshots
No response
Additional Information
- GraphRAG Version: 2.2.1
- Operating System: Mac M
- Python Version: Python 3.12
- Related Issues:
the same but I have extract_graph.txt I try to change its content and see this bug if slove this problem please tell me thank you
The docs are bit outdated, extract_graph.txt is your entity_extraction prompt. you need to edit extract_graph in settings.yaml:
The same issue persists. After specifying the domain and generating a new extract_graph.txt file, the obtained entities are entity_types: [character, emotion, relationship, setting].
Therefore: extract_graph: model_id: default_chat_model #prompt: "prompts/extract_graph.txt" prompt: "prompt-paper/extract_graph.txt" #entity_types: [organization,person,geo,event] entity_types: [character, emotion, relationship, setting] max_gleanings: 1
Should I configure it this way? Even with this configuration, it still throws the error KeyError: 'title'.