graphrag
graphrag copied to clipboard
[Bug]: Handling text without any entities and relationships
Do you need to file an issue?
- [ ] I have searched the existing issues and this bug is not already filed.
- [ ] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
- [x] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the bug
When the text is as simple as "Hello, world". We may not be able to extract any entities or relationships. This currently throws error as relevant keys are not extracted. Could we set a default value in such cases?
Steps to reproduce
Use a simple sentence like "Hello world"
Expected Behavior
2025-04-15 07:57:54,494|ERROR|graphrag.index.run.run_pipeline:156:error running workflow extract_graph
Traceback (most recent call last):
File "/home/abdul/data/GraphRAG/src/graphrag/index/run/run_pipeline.py", line 143, in _run_pipeline
result = await workflow_function(config, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/abdul/data/GraphRAG/src/graphrag/index/workflows/extract_graph.py", line 46, in run_workflow
entities, relationships = await extract_graph(
^^^^^^^^^^^^^^^^^^^^
File "/home/abdul/data/GraphRAG/src/graphrag/index/workflows/extract_graph.py", line 82, in extract_graph
extracted_entities, extracted_relationships = await extractor(
^^^^^^^^^^^^^^^^
File "/home/abdul/data/GraphRAG/src/graphrag/index/operations/extract_graph/extract_graph.py", line 133, in extract_graph
relationships = _merge_relationships(relationship_dfs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/abdul/data/GraphRAG/src/graphrag/index/operations/extract_graph/extract_graph.py", line 170, in _merge_relationships
.agg(
^^^^
File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/groupby/generic.py", line 1432, in aggregate
result = op.agg()
^^^^^^^^
File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/apply.py", line 190, in agg
return self.agg_dict_like()
^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/apply.py", line 423, in agg_dict_like
return self.agg_or_apply_dict_like(op_name="agg")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/apply.py", line 1608, in agg_or_apply_dict_like
result_index, result_data = self.compute_dict_like(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/apply.py", line 462, in compute_dict_like
func = self.normalize_dictlike_arg(op_name, selected_obj, func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/apply.py", line 663, in normalize_dictlike_arg
raise KeyError(f"Column(s) {list(cols)} do not exist")
KeyError: "Column(s) ['description', 'source_id', 'weight'] do not exist"
GraphRAG Config Used
llm: gpt 4o embedding_model: text-embedding-ada-002
models:
default_chat_model:
type: openai_chat
auth_type: api_key
api_key: <REDACTED>
model: ${LLM_ID}
model_supports_json: false
concurrent_requests: 25
async_mode: threaded
retry_strategy: native
max_retries: -1
tokens_per_minute: 0
requests_per_minute: 0
api_base: ${BASE_URL}
encoding_model: o200k_base
default_embedding_model:
type: openai_embedding
auth_type: api_key
api_key: <REDACTED>
model: ${BATCH_EMBEDDING_MODEL_ID}
model_supports_json: false
concurrent_requests: 25
async_mode: threaded
retry_strategy: native
max_retries: -1
tokens_per_minute: 0
requests_per_minute: 0
api_base: ${BASE_URL}
encoding_model: cl100k_base
input:
type: blob
file_type: csv
base_dir: input
container_name: test002
connection_string: <REDACTED>;
metadata: []
chunks:
size: 1200
overlap: 100
group_by_columns:
- id
encoding_model: o200k_base
prepend_metadata: true
chunk_size_includes_metadata: true
output:
type: blob
base_dir: output
container_name: test002
connection_string: <REDACTED>
cache:
type: blob
base_dir: cache
container_name: test002
connection_string: <REDACTED>
reporting:
type: blob
base_dir: logs
container_name: test002
connection_string: <REDACTED>
vector_store:
default_vector_store:
type: cosmosdb
connection_string: <REDACTED>
url: <REDACTED>
api_key: <REDACTED>
database_name: graphrag-evaluation
vector_size: 1536
collection_name: test002
container_name: test002
overwrite: true
embed_text:
model_id: default_embedding_model
vector_store_id: default_vector_store
extract_graph:
model_id: default_chat_model
prompt: prompts/extract_graph.txt
entity_types:
- organization
- person
- geo
- event
max_gleanings: 1
summarize_descriptions:
model_id: default_chat_model
prompt: prompts/summarize_descriptions.txt
max_length: 500
extract_graph_nlp:
text_analyzer:
extractor_type: regex_english
cluster_graph:
max_cluster_size: 10
extract_claims:
enabled: true
model_id: default_chat_model
prompt: prompts/extract_claims.txt
description: Any claims or facts that could be relevant to information discovery.
max_gleanings: 1
community_reports:
model_id: default_chat_model
graph_prompt: prompts/community_report_graph.txt
text_prompt: prompts/community_report_text.txt
max_length: 2000
max_input_length: 8000
embed_graph:
enabled: false
umap:
enabled: false
snapshots:
graphml: false
embeddings: false
local_search:
chat_model_id: default_chat_model
embedding_model_id: default_embedding_model
prompt: prompts/local_search_system_prompt.txt
global_search:
chat_model_id: default_chat_model
map_prompt: prompts/global_search_map_system_prompt.txt
reduce_prompt: prompts/global_search_reduce_system_prompt.txt
knowledge_prompt: prompts/global_search_knowledge_system_prompt.txt
drift_search:
chat_model_id: default_chat_model
embedding_model_id: default_embedding_model
prompt: prompts/drift_search_system_prompt.txt
reduce_prompt: prompts/drift_search_reduce_prompt.txt
basic_search:
chat_model_id: default_chat_model
embedding_model_id: default_embedding_model
prompt: prompts/basic_search_system_prompt.txt
Logs and screenshots
No response
Additional Information
- GraphRAG Version: 2.1.0
- Operating System: Linux
- Python Version: 3.12
- Related Issues:
I'll see if we can put a fallback in for these scenarios
@natoverse I'm getting this issue on the example from Getting Started
This is fixed on the v3/main branch and will be released in the next few weeks