Describe the bug

00:58:35,677 graphrag.index.verbs.graph.clustering.cluster_graph WARNING Graph has no nodes 00:58:35,679 datashaper.workflow.workflow ERROR Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key Traceback (most recent call last): File "C:\Users\Ryan\AppData\Roaming\Python\Python311\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb result = node.verb.func(**verb_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ryan\AppData\Roaming\Python\Python311\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph output_df[[level_to, to]] = pd.DataFrame( ~~~~~~~~~^^^^^^^^^^^^^^^^ File "C:\Users\Ryan\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4299, in setitem self._setitem_array(key, value) File "C:\Users\Ryan\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4341, in _setitem_array check_key_length(self.columns, key, value) File "C:\Users\Ryan\AppData\Roaming\Python\Python311\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length raise ValueError("Columns must be same length as key") ValueError: Columns must be same length as key 00:58:35,682 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key details=None 00:58:35,682 graphrag.index.run ERROR error running workflow create_base_entity_graph Traceback (most recent call last): File "C:\Users\Ryan\AppData\Roaming\Python\Python311\site-packages\graphrag\index\run.py", line 323, in run_pipeline result = await workflow.run(context, callbacks) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ryan\AppData\Roaming\Python\Python311\site-packages\datashaper\workflow\workflow.py", line 369, in run timing = await self._execute_verb(node, context, callbacks) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ryan\AppData\Roaming\Python\Python311\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb result = node.verb.func(**verb_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ryan\AppData\Roaming\Python\Python311\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph output_df[[level_to, to]] = pd.DataFrame( ~~~~~~~~~^^^^^^^^^^^^^^^^ File "C:\Users\Ryan\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4299, in setitem self._setitem_array(key, value) File "C:\Users\Ryan\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4341, in _setitem_array check_key_length(self.columns, key, value) File "C:\Users\Ryan\AppData\Roaming\Python\Python311\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length raise ValueError("Columns must be same length as key") ValueError: Columns must be same length as key

Steps to reproduce

用本地部署的大模型复现demo，出现报错

Expected Behavior

No response

GraphRAG Config Used

No response

Logs and screenshots

No response

Additional Information

GraphRAG Version:
Operating System:
Python Version:
Related Issues:

Jul 11 '24 17:07 yuangtao

Hi @yuangtao Could you please share your config file?

Jul 11 '24 22:07 AlonsoGuevara

Hi @yuangtao Could you please share your config file?

encoding_model: cl100k_base skip_workflows: [] llm: api_key: ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat model: qwen2-0.5b model_supports_json: true # recommended if this is available for your model. max_tokens: 1024 #4000

request_timeout: 180.0

api_base: http://localhost:1234/v1

embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: openai_embedding # or azure_openai_embedding model: nomic-embed-text-v1.5.Q2_K api_base: http://localhost:1234/v1

Jul 12 '24 01:07 yuangtao

Hi @yuangtao Could you please share your config file?

I used LM Studio for local deployment.

Jul 12 '24 01:07 yuangtao

same issue, is this problem related to the model? not openai

Jul 15 '24 09:07 SeanFeng91

I may find the reason. I use the agicto api(api_base: https://api.agicto.cn/v1) with deepseek-chat&text-embedding-3-small, it works. My issue of "Columns must be same length as key, Errors occurred during the pipeline run" may caused by wrong api_base format, which i was written as api_base:

Jul 15 '24 09:07 SeanFeng91

api_base path should be added /v1

Jul 24 '24 06:07 gubinjie

I have dug a little the issue. The problem is when the LLM generate an empty answer or there is a problem parsing it.

Then in the module cluster_graph.py graphrag try to execute (line 122)

output_df[[level_to, to]] = pd.DataFrame(
            output_df[to].tolist(), index=output_df.index
        )

with typically

level_to = "level"
to = "clustered_graph"
output_df_index = RangeIndex(start=0, stop=1, step=1)

and This does'nt work since has not the good number of column.

Now there is two choice :

Either Graphrag should stop if the LLM doesn't provide a good answer and this piece of code is neither execute
Or the library should take into account this edge case for exemple

    if to_insert.isna().all()[0]:
        output_df.drop(columns=[community_map_to], inplace=True)
        output_df[[level_to, to]] = pd.DataFrame([([],"")])
        return TableContainer(table=output_df)
    else:
        output_df[[level_to, to]] = pd.DataFrame(
            output_df[to].tolist(), index=output_df.index
        )

Jul 24 '24 12:07 etiennebonnafoux

In both case there should be a more explicit message in the log than this panda Error.

Jul 24 '24 12:07 etiennebonnafoux

We see this issue filed commonly with models that return an unexpected format. Routing to the consolidated alternate model providers issue #657.

Jul 25 '24 22:07 natoverse

We see this issue filed commonly with models that return an unexpected format. Routing to the consolidated alternate model providers issue #657.

But I do use Azure OpenAI. So it's not only an alternate model issue.

Jul 29 '24 15:07 etiennebonnafoux

If it’s helpful to others, I don’t think this issue is related to the model itself. I got this while running autotuning on an empty file. I’ve seen similar errors such as:

ValueError: Columns must be same length as key
KeyError: "Column(s) ['description', 'source_id', 'weight'] do not exist"
KeyError: 'title' (from pandas\core\groupby\grouper.py)

All of these occur when the files being indexed don’t contain enough meaningful text for GraphRAG to extract any entities or relationships(like an empty file or very little legible content).. In such cases, the extraction step returns empty DataFrames, which then cause downstream failures during merging or grouping.

It would be great if GraphRAG could handle this case more gracefully — for example, by skipping empty files or checking for malfomred dfs returned from llm or atleast throw better exception than breaking down at pandas as mentioned above

Oct 16 '25 05:10 gona-sreelatha

[Bug]: ValueError: Columns must be same length as key

Describe the bug

Steps to reproduce

Expected Behavior

GraphRAG Config Used

Logs and screenshots

Additional Information

request_timeout: 180.0

parallelization: override the global parallelization settings for embeddings