graphrag Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key

This is my configuration:

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: lm-studio
  type: openai_chat # or azure_openai_chat
  model: bartowski/gemma-2-9b-it-GGUF/gemma-2-9b-it-Q6_K-Q8.gguf
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://localhost:1234/v1

parallelization:
  stagger: 0.3

async_mode: threaded # or asyncio

embeddings:
  async_mode: threaded # or asyncio
  llm:
    api_key: lm-studio
    type: openai_embedding # or azure_openai_embedding
    model: nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q8_0.gguf
    api_base: http://localhost:1234/v1

This is my error log:

00:17:52,468 datashaper.workflow.workflow ERROR Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
    result = node.verb.func(**verb_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py", line 102, in cluster_graph
    output_df[[level_to, to]] = pd.DataFrame(
    ~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4299, in __setitem__
    self._setitem_array(key, value)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4341, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
00:17:52,469 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key details=None
00:17:52,469 graphrag.index.run ERROR error running workflow create_base_entity_graph
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/graphrag/index/run.py", line 323, in run_pipeline
    result = await workflow.run(context, callbacks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 369, in run
    timing = await self._execute_verb(node, context, callbacks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
    result = node.verb.func(**verb_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py", line 102, in cluster_graph
    output_df[[level_to, to]] = pd.DataFrame(
    ~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4299, in __setitem__
    self._setitem_array(key, value)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4341, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
00:17:52,470 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None

This is my console log:

🚀 Reading settings from ragtest/settings.yaml
/opt/anaconda3/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59:
FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a
future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
🚀 create_base_text_units
                                 id  ... n_tokens
0  4d58d18fc8bedcf601e27bb07cdc3f8e  ...      300
1  288d3e4ebc58510cc7153d89f5946a5f  ...      300
2  a13a2f2347995e03c804450b08354b12  ...      208
3  d53faf2c8abaa7cd58e253d514fe6ad3  ...        8

[4 rows x 5 columns]
🚀 create_base_extracted_entities
                                        entity_graph
0  <graphml xmlns="http://graphml.graphdrawing.or...
🚀 create_summarized_entities
                                        entity_graph
0  <graphml xmlns="http://graphml.graphdrawing.or...
❌ create_base_entity_graph
None
⠴ GraphRAG Indexer
├── Loading Input (InputFileType.text) - 1 files loaded (1 filtered) ━ 100% … 0…
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
└── create_base_entity_graph
❌ Errors occurred during the pipeline run, see logs for more details.

Jul 07 '24 16:07 451222664

Hi! Can you please check in your cache files or output files if the entity extraction was succesful? Most errors on the clustering step relate to faulty entity extractions, either by 0 extracted entities or by wrong responses from the ll..

Jul 07 '24 18:07 AlonsoGuevara

It means that there is something wrong with the result of LLM processing, right?

"<|COMPLETE|> 


Let me know if you'd like to try another example!  I'm ready when you are."

Jul 08 '24 10:07 451222664

This error should be caused by your embedding or model not loading correctly. You can refer to my configuration modification.

Jul 08 '24 13:07 Nuclear6

Hi @451222664

I am also getting same error!

Pasted the logs below, feels like an issue with ollama. Please confirm you are also getting same logs.

Screenshot 2024-07-08 214205

Jul 08 '24 16:07 AnandMoorthy

Hi @451222664 By the response provided, yup, the LLM you're using is ignoring the format we are looking for in the output and it is being more "chatty". I would suggest doing some prompt tuning to try to force the LLM into the format we need for parsing.

Jul 09 '24 21:07 AlonsoGuevara

Hi @451222664

I am also getting same error!

Pasted the logs below, feels like an issue with ollama. Please confirm you are also getting same logs.

It turns out ollama was not started properly, restarting the service fixed the issue.

Jul 12 '24 18:07 AnandMoorthy

Hi! We are consolidating alternate model issues here: https://github.com/microsoft/graphrag/issues/657

Jul 22 '24 23:07 AlonsoGuevara

i am also using ollama model llama3.2 and facing the same issue

Oct 18 '24 15:10 RajSharma1902