ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}
Describe the bug
{"type": "error", "data": "Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key", "stack": "Traceback (most recent call last):\n File "C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb\n result = node.verb.func(**verb_args)\n File "C:\Program Files\Python310\lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py", line 4299, in setitem\n self._setitem_array(key, value)\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length\n raise ValueError("Columns must be same length as key")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null} {"type": "error", "data": "Error running pipeline!", "stack": "Traceback (most recent call last):\n File "C:\Program Files\Python310\lib\site-packages\graphrag\index\run.py", line 323, in run_pipeline\n result = await workflow.run(context, callbacks)\n File "C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n File "C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb\n result = node.verb.func(**verb_args)\n File "C:\Program Files\Python310\lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py", line 4299, in setitem\n self._setitem_array(key, value)\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length\n raise ValueError("Columns must be same length as key")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}
Steps to reproduce
I was using a local ollama model to use the tool. It ran fine and loaded the test file before the error occurred.
Expected Behavior
The tool should have proceeded with the following step "create_base_text_units" rather than cease operation. It appears to be a bug with the graphing function.
GraphRAG Config Used
encoding_model: cl100k_base skip_workflows: [] llm: api_key: ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat model: command-r-plus:104b-q4_0 model_supports_json: true # recommended if this is available for your model.
max_tokens: 2000
request_timeout: 180.0
api_base: http://localhost:11434/v1
api_version: 2024-02-15-preview
organization: <organization_id>
deployment_name: <azure_model_deployment_name>
tokens_per_minute: 150_000 # set a leaky bucket throttle
requests_per_minute: 10_000 # set a leaky bucket throttle
max_retries: 1
max_retry_wait: 10.0
sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
concurrent_requests: 1 # the number of parallel inflight requests that may be made
parallelization: stagger: 0.3
num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
embeddings:
parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: openai_embedding # or azure_openai_embedding model: qwen2:7b-instruct # api_base: http://localhost:11434/api # api_version: 2024-02-15-preview # organization: <organization_id> # deployment_name: <azure_model_deployment_name> # tokens_per_minute: 150_000 # set a leaky bucket throttle # requests_per_minute: 10_000 # set a leaky bucket throttle # max_retries: 1 # max_retry_wait: 10.0 # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times # concurrent_requests: 1 # the number of parallel inflight requests that may be made # batch_size: 1 # the number of documents to send in a single request # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request # target: required # or optional
No change to the remainder
Logs and screenshots
Additional Information
- GraphRAG Version: Current of this posting
- Operating System: Windows 10
- Python Version: 3.10
- Related Issues:
Hi! My general rule of thumb when facing this issues is:
- Check the outputs of the entity extraction, this will show if the graph is empty
- If the graph is empty, then it can be either faulty llm responses (unparseable) or, LLM calling failures
Can you please check your cache entries for Entity Extraction to check if the LLM is providing faulty responses?
Entity extraction directory is empty. I attempted with 2 other different models and was met with the same result.
Facing the same thing. cache/entity_extraction is empty. same exact error in the logs.
same error
same error
same error
same error, Entity extraction directory is empty.
same error:
But entity_extraction directory is not empty.
settings.yaml
same error lol But entity_extraction and summarize_descriptions directories are also not empty.
same error why
same problem.
+1
+1
Consolidating alternate model issues here: #657
面对同样的事情。cache/entity_extraction 为空。日志中出现完全相同的错误。
解决了吗
实体提取目录为空。我尝试了其他 2 种不同的模型,得到了相同的结果。
解决了吗
描述错误
{“type”: “error”, “data”: “在create_base_entity_graph中执行动词”cluster_graph“时出错:列的长度必须与键相同”, “stack”: “回溯(最近一次调用):\n 文件 ”C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py“, line 410, in _execute_verb\n result = node.verb.func(**verb_args)\n 文件 ”C:\Program Files\Python310\lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py“, 第 102 行,在 cluster_graph\n output_df[[level_to, to]] = PD。DataFrame(\n 文件 “C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py”, 第 4299 行, 在 setitem\n self._setitem_array(键, 值)\n 文件 “C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py”, 行 4341, 在 _setitem_array\n check_key_length(self.columns, 键, 值)\n 文件 “C:\Program Files\Python310\lib\site-packages\pandas\core\indexers\utils.py”, 第 390 行,在 check_key_length\n 引发 ValueError(“列必须与键的长度相同”)\nValueError: 列的长度必须与键相同“, ”source“: ”列的长度必须与键的长度相同“, ”details“: null} {”type“: ”错误“, ”data“: ”运行管道时出错!“, ”stack“: ”回溯(最近一次调用最后一次):\n 文件 “C:\Program Files\Python310\lib\site-packages\graphrag\index\run.py”, 第 323 行,run_pipeline\n 结果 = await workflow.run(context, callbacks)\n 文件 “C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py”,第 369 行,运行\n 计时 = 等待self._execute_verb(节点、上下文、回调)\n 文件 “C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py”,第 410 行,_execute_verb\n 结果 = node.verb.func(**verb_args)\n 文件“C:\Program Files\Python310\lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py“,第 102 行,cluster_graph\n output_df[[level_to, to]] = pd。DataFrame(\n 文件 “C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py”, 第 4299 行, 在 setitem\n self._setitem_array(键, 值)\n 文件 “C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py”, 行 4341, 在 _setitem_array\n check_key_length(self.columns, 键, 值)\n 文件 “C:\Program Files\Python310\lib\site-packages\pandas\core\indexers\utils.py”, 第 390 行,在 check_key_length\n 中引发 ValueError(“列必须与键的长度相同”)\nValueError: 列的长度必须与键相同“, ”source“: ”列的长度必须与键的长度相同“, ”details“: null}
重现步骤
我正在使用本地 ollama 模型来使用该工具。它运行良好,并在错误发生之前加载了测试文件。
预期行为
该工具应继续执行以下步骤“create_base_text_units”,而不是停止操作。这似乎是绘图功能的一个错误。
使用的 GraphRAG 配置
encoding_model: cl100k_base skip_workflows: [] LLM: api_key: ${GRAPHRAG_API_KEY} type: openai_chat # 或 azure_openai_chat model: command-r-plus:104b-q4_0 model_supports_json: true # 如果这适用于您的模型,则推荐使用。
max_tokens: 2000
request_timeout: 180.0
api_base: http://localhost:11434/v1
api_version: 2024-02-15-preview
组织机构: <organization_id>
deployment_name: <azure_model_deployment_name>
tokens_per_minute: 150_000 # 设置漏斗油门
requests_per_minute: 10_000 # 设置漏斗油门
max_retries: 1
max_retry_wait:10.0
sleep_on_rate_limit_recommendation: true # 当 Azure 建议等待时间时是否休眠
concurrent_requests: 1 # 可以发出的并行飞行请求的数量
并行化: 交错: 0.3
num_threads: 50 # 用于并行处理的线程数
async_mode:threaded # 或 asyncio
嵌入:
并行化:覆盖嵌入的全局并行化设置
async_mode: threaded # 或 asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: openai_embedding # 或 azure_openai_embedding model: qwen2:7b-instruct # api_base: http://localhost:11434/api # api_version: 2024-02-15-preview # 组织: <organization_id> # deployment_name: <azure_model_deployment_name> # tokens_per_minute: 150_000 # 设置漏桶油门 # requests_per_minute: 10_000 # 设置漏桶限制 # max_retries: 1 # max_retry_wait: 10.0 # sleep_on_rate_limit_recommendation: true # 当 Azure 建议等待时间时是否休眠 # concurrent_requests: 1 # 可以发出的并行飞行请求数 # batch_size: 1 # 单次请求中要发送的文档数量# batch_max_tokens: 8191 # 单个请求中发送的最大令牌数 # 目标:必填 # 或可选
其余部分不变
日志和屏幕截图
### 其他信息 * GraphRAG 版本:此帖子的当前内容 * 操作系统:Windows 10 * Python版本:3.10 * 相关问题:
解决了吗,兄弟
同样的错误:但是entity_extraction目录不是空的。
解决了吗
I use openAI GPT-4o-mini,after I reduce chunks size from 1000 to 200 and decrease overlay to 10. it works for me!
chunks:
size: 200
overlap: 10
group_by_columns: [id] # by default, we don't allow chunks to cross documents
same
Same issue here. I used gpt-4o-mini, along with default text-embedding-3-small, max_token set to 1700.
Any official solution yet?
I also encountered this issue, and the root cause is that the results extracted by your model are not good enough. On one hand, you can choose a more powerful large model; on the other hand, you can adjust the llm:max_token in the settings.yaml to be smaller, or reduce the chunks:size and overlap as well.
### 其他信息
* GraphRAG 版本:此帖子的当前内容
* 操作系统:Windows 10
* Python版本:3.10
* 相关问题:

