Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key
This is my configuration:
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: lm-studio
type: openai_chat # or azure_openai_chat
model: bartowski/gemma-2-9b-it-GGUF/gemma-2-9b-it-Q6_K-Q8.gguf
model_supports_json: true # recommended if this is available for your model.
# max_tokens: 4000
# request_timeout: 180.0
api_base: http://localhost:1234/v1
parallelization:
stagger: 0.3
async_mode: threaded # or asyncio
embeddings:
async_mode: threaded # or asyncio
llm:
api_key: lm-studio
type: openai_embedding # or azure_openai_embedding
model: nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q8_0.gguf
api_base: http://localhost:1234/v1
This is my error log:
00:17:52,468 datashaper.workflow.workflow ERROR Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
result = node.verb.func(**verb_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py", line 102, in cluster_graph
output_df[[level_to, to]] = pd.DataFrame(
~~~~~~~~~^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4299, in __setitem__
self._setitem_array(key, value)
File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4341, in _setitem_array
check_key_length(self.columns, key, value)
File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
00:17:52,469 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key details=None
00:17:52,469 graphrag.index.run ERROR error running workflow create_base_entity_graph
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.11/site-packages/graphrag/index/run.py", line 323, in run_pipeline
result = await workflow.run(context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 369, in run
timing = await self._execute_verb(node, context, callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
result = node.verb.func(**verb_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py", line 102, in cluster_graph
output_df[[level_to, to]] = pd.DataFrame(
~~~~~~~~~^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4299, in __setitem__
self._setitem_array(key, value)
File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4341, in _setitem_array
check_key_length(self.columns, key, value)
File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
00:17:52,470 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None
This is my console log:
🚀 Reading settings from ragtest/settings.yaml
/opt/anaconda3/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59:
FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a
future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
🚀 create_base_text_units
id ... n_tokens
0 4d58d18fc8bedcf601e27bb07cdc3f8e ... 300
1 288d3e4ebc58510cc7153d89f5946a5f ... 300
2 a13a2f2347995e03c804450b08354b12 ... 208
3 d53faf2c8abaa7cd58e253d514fe6ad3 ... 8
[4 rows x 5 columns]
🚀 create_base_extracted_entities
entity_graph
0 <graphml xmlns="http://graphml.graphdrawing.or...
🚀 create_summarized_entities
entity_graph
0 <graphml xmlns="http://graphml.graphdrawing.or...
❌ create_base_entity_graph
None
⠴ GraphRAG Indexer
├── Loading Input (InputFileType.text) - 1 files loaded (1 filtered) ━ 100% … 0…
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
└── create_base_entity_graph
❌ Errors occurred during the pipeline run, see logs for more details.
Hi! Can you please check in your cache files or output files if the entity extraction was succesful? Most errors on the clustering step relate to faulty entity extractions, either by 0 extracted entities or by wrong responses from the ll..
It means that there is something wrong with the result of LLM processing, right?
"<|COMPLETE|>
Let me know if you'd like to try another example! I'm ready when you are."
This error should be caused by your embedding or model not loading correctly. You can refer to my configuration modification.
Hi @451222664
I am also getting same error!
Pasted the logs below, feels like an issue with ollama. Please confirm you are also getting same logs.
Hi @451222664 By the response provided, yup, the LLM you're using is ignoring the format we are looking for in the output and it is being more "chatty". I would suggest doing some prompt tuning to try to force the LLM into the format we need for parsing.
Hi @451222664
I am also getting same error!
Pasted the logs below, feels like an issue with ollama. Please confirm you are also getting same logs.
It turns out ollama was not started properly, restarting the service fixed the issue.
Hi! We are consolidating alternate model issues here: https://github.com/microsoft/graphrag/issues/657
i am also using ollama model llama3.2 and facing the same issue
