graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

Indexer Error

Open kinfey opened this issue 1 year ago • 6 comments

Describe the bug

When I run Indexer, it always give me this error

image


{"type": "error", "data": "Error executing verb \"cluster_graph\" in create_base_entity_graph: EmptyNetworkError", "stack": "Traceback (most recent call last):\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/datashaper/workflow/workflow.py\", line 410, in _execute_verb\n    result = node.verb.func(**verb_args)\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in cluster_graph\n    results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/series.py\", line 4924, in apply\n    ).apply()\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/apply.py\", line 1427, in apply\n    return self.apply_standard()\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/apply.py\", line 1507, in apply_standard\n    mapped = obj._map_values(\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/base.py\", line 921, in _map_values\n    return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/algorithms.py\", line 1743, in map_array\n    return lib.map_infer(values, mapper, convert=convert)\n  File \"lib.pyx\", line 2972, in pandas._libs.lib.map_infer\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in <lambda>\n    results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 167, in run_layout\n    clusters = run_leiden(graph, strategy)\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 26, in run\n    node_id_to_community_map = _compute_leiden_communities(\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 61, in _compute_leiden_communities\n    community_mapping = hierarchical_leiden(\n  File \"<@beartype(graspologic.partition.leiden.hierarchical_leiden) at 0x330776e60>\", line 304, in hierarchical_leiden\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graspologic/partition/leiden.py\", line 588, in hierarchical_leiden\n    hierarchical_clusters_native = gn.hierarchical_leiden(\nleiden.EmptyNetworkError: EmptyNetworkError\n", "source": "EmptyNetworkError", "details": null}
 

Steps to reproduce

No response

Expected Behavior

No response

GraphRAG Config Used



encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: phi-3-mini
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 13000
  request_timeout: 2800.0
  api_base: http://localhost:5146/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: jinaai
    request_timeout: 2800.0
    api_base: http://localhost:5146/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
  


chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents
    
input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000
  model_supports_json: false 

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32


Logs and screenshots

No response

Additional Information

  • GraphRAG Version: 0.1.1
  • Operating System: macOS
  • Python Version: 3.10.12
  • Related Issues:

kinfey avatar Jul 09 '24 17:07 kinfey

got similar error...let me know if you find a solution...Tqs in advance.!!

sriharshaguthikonda avatar Jul 09 '24 20:07 sriharshaguthikonda

Hi

My general rule of thumb when facing this issue is:

  • Check the outputs of the entity extraction, this will show if the graph is empty
  • If the graph is empty, then it can be either faulty llm responses (unparseable) or, LLM calling failures

Can you please check and share any of your llm responses from the cache folder?

AlonsoGuevara avatar Jul 09 '24 21:07 AlonsoGuevara

cache.zip @AlonsoGuevara this is my cache folder

kinfey avatar Jul 09 '24 23:07 kinfey

check your log report at /outputs/latestdate-time/reports. mostly the llm was not working...

cove9988 avatar Jul 10 '24 01:07 cove9988

but it can gen something in backend

kinfey avatar Jul 11 '24 01:07 kinfey

Can you post the detailed log file?

Nuclear6 avatar Jul 11 '24 03:07 Nuclear6

Consolidating alternate model issues here: #657

natoverse avatar Jul 22 '24 23:07 natoverse