graphrag Indexer Error

Describe the bug

When I run Indexer, it always give me this error


{"type": "error", "data": "Error executing verb \"cluster_graph\" in create_base_entity_graph: EmptyNetworkError", "stack": "Traceback (most recent call last):\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/datashaper/workflow/workflow.py\", line 410, in _execute_verb\n    result = node.verb.func(**verb_args)\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in cluster_graph\n    results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/series.py\", line 4924, in apply\n    ).apply()\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/apply.py\", line 1427, in apply\n    return self.apply_standard()\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/apply.py\", line 1507, in apply_standard\n    mapped = obj._map_values(\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/base.py\", line 921, in _map_values\n    return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/algorithms.py\", line 1743, in map_array\n    return lib.map_infer(values, mapper, convert=convert)\n  File \"lib.pyx\", line 2972, in pandas._libs.lib.map_infer\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in <lambda>\n    results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 167, in run_layout\n    clusters = run_leiden(graph, strategy)\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 26, in run\n    node_id_to_community_map = _compute_leiden_communities(\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 61, in _compute_leiden_communities\n    community_mapping = hierarchical_leiden(\n  File \"<@beartype(graspologic.partition.leiden.hierarchical_leiden) at 0x330776e60>\", line 304, in hierarchical_leiden\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graspologic/partition/leiden.py\", line 588, in hierarchical_leiden\n    hierarchical_clusters_native = gn.hierarchical_leiden(\nleiden.EmptyNetworkError: EmptyNetworkError\n", "source": "EmptyNetworkError", "details": null}

Steps to reproduce

No response

Expected Behavior

No response

GraphRAG Config Used



encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: phi-3-mini
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 13000
  request_timeout: 2800.0
  api_base: http://localhost:5146/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: jinaai
    request_timeout: 2800.0
    api_base: http://localhost:5146/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
  


chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents
    
input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000
  model_supports_json: false 

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Logs and screenshots

No response

Additional Information

GraphRAG Version: 0.1.1
Operating System: macOS
Python Version: 3.10.12
Related Issues:

Jul 09 '24 17:07 kinfey

got similar error...let me know if you find a solution...Tqs in advance.!!

Jul 09 '24 20:07 sriharshaguthikonda

Hi

My general rule of thumb when facing this issue is:

Check the outputs of the entity extraction, this will show if the graph is empty
If the graph is empty, then it can be either faulty llm responses (unparseable) or, LLM calling failures

Can you please check and share any of your llm responses from the cache folder?