Indexer Error
Describe the bug
When I run Indexer, it always give me this error
{"type": "error", "data": "Error executing verb \"cluster_graph\" in create_base_entity_graph: EmptyNetworkError", "stack": "Traceback (most recent call last):\n File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/datashaper/workflow/workflow.py\", line 410, in _execute_verb\n result = node.verb.func(**verb_args)\n File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in cluster_graph\n results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/series.py\", line 4924, in apply\n ).apply()\n File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/apply.py\", line 1427, in apply\n return self.apply_standard()\n File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/apply.py\", line 1507, in apply_standard\n mapped = obj._map_values(\n File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/base.py\", line 921, in _map_values\n return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)\n File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/algorithms.py\", line 1743, in map_array\n return lib.map_infer(values, mapper, convert=convert)\n File \"lib.pyx\", line 2972, in pandas._libs.lib.map_infer\n File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in <lambda>\n results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 167, in run_layout\n clusters = run_leiden(graph, strategy)\n File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 26, in run\n node_id_to_community_map = _compute_leiden_communities(\n File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 61, in _compute_leiden_communities\n community_mapping = hierarchical_leiden(\n File \"<@beartype(graspologic.partition.leiden.hierarchical_leiden) at 0x330776e60>\", line 304, in hierarchical_leiden\n File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graspologic/partition/leiden.py\", line 588, in hierarchical_leiden\n hierarchical_clusters_native = gn.hierarchical_leiden(\nleiden.EmptyNetworkError: EmptyNetworkError\n", "source": "EmptyNetworkError", "details": null}
Steps to reproduce
No response
Expected Behavior
No response
GraphRAG Config Used
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat # or azure_openai_chat
model: phi-3-mini
model_supports_json: true # recommended if this is available for your model.
max_tokens: 13000
request_timeout: 2800.0
api_base: http://localhost:5146/v1
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 10
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
parallelization:
stagger: 0.3
# num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
embeddings:
## parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding # or azure_openai_embedding
model: jinaai
request_timeout: 2800.0
api_base: http://localhost:5146/v1
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 10
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
# batch_size: 16 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional
chunks:
size: 300
overlap: 100
group_by_columns: [id] # by default, we don't allow chunks to cross documents
input:
type: file # or blob
file_type: text # or csv
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\\.txt$"
cache:
type: file # or blob
base_dir: "cache"
# connection_string: <azure_blob_storage_connection_string>
# container_name: <azure_blob_storage_container_name>
storage:
type: file # or blob
base_dir: "output/${timestamp}/artifacts"
# connection_string: <azure_blob_storage_connection_string>
# container_name: <azure_blob_storage_container_name>
reporting:
type: file # or console, blob
base_dir: "output/${timestamp}/reports"
# connection_string: <azure_blob_storage_connection_string>
# container_name: <azure_blob_storage_container_name>
entity_extraction:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
prompt: "prompts/entity_extraction.txt"
entity_types: [event]
max_gleanings: 0
summarize_descriptions:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
prompt: "prompts/summarize_descriptions.txt"
max_length: 500
claim_extraction:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
# enabled: true
prompt: "prompts/claim_extraction.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 0
community_report:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 8000
model_supports_json: false
cluster_graph:
max_cluster_size: 10
embed_graph:
enabled: false # if true, will generate node2vec embeddings for nodes
# num_walks: 10
# walk_length: 40
# window_size: 2
# iterations: 3
# random_seed: 597832
umap:
enabled: false # if true, will generate UMAP embeddings for nodes
snapshots:
graphml: false
raw_entities: false
top_level_nodes: false
local_search:
# text_unit_prop: 0.5
# community_prop: 0.1
# conversation_history_max_turns: 5
# top_k_mapped_entities: 10
# top_k_relationships: 10
# max_tokens: 12000
global_search:
# max_tokens: 12000
# data_max_tokens: 12000
# map_max_tokens: 1000
# reduce_max_tokens: 2000
# concurrency: 32
Logs and screenshots
No response
Additional Information
- GraphRAG Version: 0.1.1
- Operating System: macOS
- Python Version: 3.10.12
- Related Issues:
got similar error...let me know if you find a solution...Tqs in advance.!!
Hi
My general rule of thumb when facing this issue is:
- Check the outputs of the entity extraction, this will show if the graph is empty
- If the graph is empty, then it can be either faulty llm responses (unparseable) or, LLM calling failures
Can you please check and share any of your llm responses from the cache folder?
cache.zip @AlonsoGuevara this is my cache folder
check your log report at /outputs/latestdate-time/reports. mostly the llm was not working...
but it can gen something in backend
Can you post the detailed log file?
Consolidating alternate model issues here: #657