GraphRAG icon indicating copy to clipboard operation
GraphRAG copied to clipboard

Query bugs

Open lllddd79 opened this issue 9 months ago • 9 comments

Hi, I'm sorry to bother you.

While testing query results using main.py, I encountered an issue. Specifically, when the number of questions exceeds two, I receive an error message stating, "Event loop is closed."

RuntimeError: Event loop is closed Traceback (most recent call last): File "/home/lcy/GraphRAG-exp/main.py", line 81, in save_path = wrapper_query(query_dataset, digimon, result_dir) File "/home/lcy/GraphRAG-exp/main.py", line 40, in wrapper_query res = asyncio.run(digimon.query(query["question"])) File "/home/lcy/anaconda3/envs/rag/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/home/lcy/anaconda3/envs/rag/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/home/lcy/GraphRAG-exp/Core/GraphRAG.py", line 290, in query response = await self._querier.query(query) File "/home/lcy/GraphRAG-exp/Core/Query/BaseQuery.py", line 22, in query context = await self._retrieve_relevant_contexts(query=query) File "/home/lcy/GraphRAG-exp/Core/Query/BasicQuery.py", line 23, in _retrieve_relevant_contexts entities_context, relations_context, text_units_context = await self._retrieve_relevant_contexts_global_keywords( File "/home/lcy/GraphRAG-exp/Core/Query/BasicQuery.py", line 168, in _retrieve_relevant_contexts_global_keywords use_entities = await self._retriever.retrieve_relevant_content(seed=edge_datas, type=Retriever.ENTITY, File "/home/lcy/GraphRAG-exp/Core/Retriever/MixRetriever.py", line 19, in retrieve_relevant_content return await self.retrievers[type].retrieve_relevant_content(mode=mode, **kwargs) File "/home/lcy/GraphRAG-exp/Core/Retriever/BaseRetriever.py", line 25, in retrieve_relevant_content return await retrieve_fun(self, **kwargs) File "/home/lcy/GraphRAG-exp/Core/Retriever/EntitiyRetriever.py", line 192, in _find_relevant_entities_by_relationships for e in seed: TypeError: 'NoneType' object is not iterable ERROR conda.cli.main_run:execute(125): conda run python /home/lcy/GraphRAG-exp/main.py -opt Option/Method/LightRAG.yaml -dataset_name datasets/test failed. (See above for error)

I am using the LightRAG.yaml file for the configuration settings. Could you help me?

lllddd79 avatar Mar 28 '25 07:03 lllddd79

This is because, your seed is none. That is, no relationships are retrieved. Please show me your question maybe

JayLZhou avatar Mar 30 '25 05:03 JayLZhou

Thanks for your reply! I met this issue when the program process the second question in multihop-rag dataset, i.e., {"question":"Which individual is implicated in both inflating the value of a Manhattan apartment to a figure not yet achieved in New York City's real estate history, according to 'Fortune', and is also accused of adjusting this apartment's valuation to compensate for a loss in another asset's worth, as reported by 'The Age'?","answer":"Donald Trump","label":"Donald Trump"}

I think I have build the graph successfully. The output is as follows. #################################################################################################### 2025-03-30 06:23:40.535 | INFO | Core.Chunk.DocChunk:build_chunks:27 - Starting chunk the given documents 2025-03-30 06:23:40.813 | INFO | Core.Storage.ChunkKVStorage:_persist:105 - Writing data into ./datasets/multihop-rag/rkg_graph/chunk_storage_chunk_data_idx.pkl and ./datasets/multihop-rag/rkg_graph/chunk_storage_chunk_data_key.pkl 2025-03-30 06:23:40.836 | INFO | Core.Chunk.DocChunk:build_chunks:74 - ✅ Finished the chunking stage 2025-03-30 06:23:40.845 | INFO | Core.GraphRAG:_update_costs_info:205 - Chunking stage cost: Total prompt token: 0, Total completeion token: 0, Total cost: 0 2025-03-30 06:23:40.846 | INFO | Core.GraphRAG:_update_costs_info:207 - Chunking time(s): 0.31 2025-03-30 06:23:40.846 | INFO | Core.Graph.BaseGraph:build_graph:41 - Starting build graph for the given documents 2025-03-30 06:23:40.846 | INFO | Core.Storage.NetworkXStorage:load_nx_graph:27 - Attempting to load the graph from: ./datasets/multihop-rag/rkg_graph/graph_storage_nx_data.graphml 2025-03-30 06:23:41.341 | INFO | Core.Storage.NetworkXStorage:load_nx_graph:31 - Successfully loaded graph from: ./datasets/multihop-rag/rkg_graph/graph_storage_nx_data.graphml with 14551 nodes and 7442 edges 2025-03-30 06:23:41.341 | INFO | Core.Graph.BaseGraph:build_graph:50 - ✅ Finished the graph building stage 2025-03-30 06:23:41.341 | INFO | Core.GraphRAG:_update_costs_info:205 - Build Graph stage cost: Total prompt token: 0, Total completeion token: 0, Total cost: 0 2025-03-30 06:23:41.341 | INFO | Core.GraphRAG:_update_costs_info:207 - Build Graph time(s): 0.50 2025-03-30 06:23:41.622 | INFO | Core.Index.BaseIndex:build_index:13 - Starting insert elements of the given graph into vector database 2025-03-30 06:23:41.622 | INFO | Core.Index.BaseIndex:build_index:17 - Loading index from the file ./datasets/multihop-rag/rkg_graph/entities_vdb 2025-03-30 06:24:34.659 | INFO | Core.Index.BaseIndex:build_index:29 - ✅ Finished starting insert entities of the given graph into vector database 2025-03-30 06:24:34.717 | INFO | Core.Index.BaseIndex:build_index:13 - Starting insert elements of the given graph into vector database 2025-03-30 06:24:34.717 | INFO | Core.Index.BaseIndex:build_index:17 - Loading index from the file ./datasets/multihop-rag/rkg_graph/relations_vdb 2025-03-30 06:25:02.264 | INFO | Core.Index.BaseIndex:build_index:29 - ✅ Finished starting insert entities of the given graph into vector database 2025-03-30 06:25:02.265 | INFO | Core.GraphRAG:_update_costs_info:205 - Index Building stage cost: Total prompt token: 0, Total completeion token: 0, Total cost: 0 2025-03-30 06:25:02.265 | INFO | Core.GraphRAG:_update_costs_info:207 - Index Building time(s): 80.92 2025-03-30 06:25:02.265 | INFO | Core.GraphRAG:_build_retriever_context:176 - Building retriever context for the current execution 0 2025-03-30 06:25:06.414 | INFO | Core.Common.CostManager:update_cost:136 - prompt_tokens: 411, completion_tokens: 55 2025-03-30 06:25:07.768 | INFO | Core.Query.BasicQuery:_retrieve_relevant_contexts_global_keywords:172 - Global query uses 1 entities, 5 relations, 3 text units 2025-03-30 06:25:10.599 | INFO | Core.Common.CostManager:update_cost:136 - prompt_tokens: 411, completion_tokens: 55 2025-03-30 06:25:10.976 | INFO | Core.Query.BasicQuery:_retrieve_relevant_contexts_local:75 - Using 5 entities, 42 relations, 4 text units 2025-03-30 06:25:13.807 | INFO | Core.Common.CostManager:update_cost:136 - prompt_tokens: 411, completion_tokens: 55 2025-03-30 06:25:14.017 | INFO | Core.Query.BasicQuery:_retrieve_relevant_contexts_global_keywords:172 - Global query uses 1 entities, 5 relations, 3 text units 2025-03-30 06:25:48.506 | INFO | Core.Common.CostManager:update_cost:136 - prompt_tokens: 8892, completion_tokens: 83 1 2025-03-30 06:25:52.894 | INFO | Core.Common.CostManager:update_cost:136 - prompt_tokens: 432, completion_tokens: 50 2025-03-30 06:25:52.895 | ERROR | Core.Retriever.RelationshipRetriever:_find_relevant_relations_vdb:48 - Failed to find relevant relationships: Event loop is closed Traceback (most recent call last):

########################

My setting file for LightRAG is as follow.

################################# Working settings  #################################
# Basic Config
use_entities_vdb: True
use_relations_vdb: True  # Only set True for LightRAG
use_subgraphs_vdb: False
llm_model_max_token_size: 32768
use_entity_link_chunk: False  # Only set True for HippoRAG and FastGraphRAG
enable_graph_augmentation: False

# Data

index_name: rkg_graph

vdb_type: vector  # vector/colbert

# Chunk Config 
chunk:
  chunk_token_size: 1200
  chunk_overlap_token_size: 100
  token_model: gpt-3.5-turbo
  chunk_method: chunking_by_token_size

# Graph Config 
graph:
  # enable LightRAG
    enable_edge_keywords: True
    graph_type: rkg_graph # rkg_graph/er_graph/tree_graph/passage_graph
    force: False
    # Building graph
    extract_two_step: True
    max_gleaning: 1
    enable_entity_description: True
    enable_entity_type: False
    enable_edge_description: True
    enable_edge_name: True




# Retrieval Config 
retriever:
    query_type: basic
    enable_local: False
    use_entity_similarity_for_ppr: True
    top_k_entity_for_ppr: 8
    node_specificity: True
    damping: 0.1
    top_k: 5

query: 
    query_type: qa
    only_need_context: False
    enable_hybrid_query: True
    augmentation_ppr: True
    response_type: Multiple Paragraphs
    level: 2
    community_information: True
    retrieve_top_k: 20
    # naive search
    naive_max_token_for_text_unit: 12000
    # local search
    local_max_token_for_text_unit: 4000  # 12000 * 0.33
    max_token_for_text_unit: 4000
    use_keywords: True
    use_community: False

    entities_max_tokens: 2000
    relationships_max_tokens: 2000

    max_token_for_local_context: 4800  # 12000 * 0.4
    local_max_token_for_community_report: 3200  # 12000 * 0.27
    local_community_single_one: False
    # global search
    use_global_query: True
    global_min_community_rating:  0
    global_max_consider_community: 512
    global_max_token_for_community_report: 16384
    max_token_for_global_context: 4000
    global_special_community_map_llm_kwargs: {"response_format": {"type": "json_object"}}
    # For IR-COT
    max_ir_steps: 2

Here is the Config2.yaml

################################# Config2 settings #################################

 # llm setting for the ollama
llm:
  api_type: "open_llm" # open_llm or openai
  base_url: 'http://localhost:11434/v1'
#  model: "qwen2.5:14b"
  model: "llama3.1:latest"
  api_key: "sk-XXXXXXXXXXXXXXXX"


embedding:
  api_type: "ollama"  # hf/ollama/openai.
  base_url: 'http://localhost:11434'
  api_key: "YOUR_API_KEY"
  model: "nomic-embed-text:latest"
  cache_dir: "/embedding_cache/"  # Cache directory for embedding models
  dimensions: 1024
  max_token_size: 8102
  embed_batch_size: 128
  embedding_func_max_async: 16

data_root:  "./" # Root directory for data



working_dir: ./ # Result directory for the experiment
#exp_name:  "light_rag_exp" # Experiment name
exp_name:  "LGraphRAG_exp"
# 

lllddd79 avatar Mar 30 '25 06:03 lllddd79

Hi. i want to know your node_datas is null? Sorry for the latter response.

JayLZhou avatar Apr 02 '25 12:04 JayLZhou

Thank you for your reply. Based on my tests, I can confirm that node_datas is not null when I met the problem. The edge_datas in function _retrieve_relevant_contexts_global_keywords is none.

Additionally, I’ve observed that when I run the questions one by one in debug mode, it is ok. This leads me to suspect that the problem might be related to asynchronous programming. Since I’m not familiar with asynchronous programming, so I’m unable to solve this issue.

lllddd79 avatar Apr 06 '25 01:04 lllddd79

Syr for the latter response, we will check how to support the batch updating for the question evualting. A.S.A.P

JayLZhou avatar Apr 06 '25 07:04 JayLZhou

sorry,I have the same questions with you,do you solve your problems?

ysq111333 avatar Apr 07 '25 14:04 ysq111333

The issue isn't fully resolved yet. I attempted to modify the def wrapper_query(query_dataset, digimon, result_dir) function in main.py to be asynchronous, replacing res = asyncio.run(digimon.query(query["question"])) with res = await digimon.query(query["question"]). This allows the program to process more than two questions, but I still encounter a bug when processing the 104th question in the multihop-rag dataset.

lllddd79 avatar Apr 08 '25 01:04 lllddd79

Thank you!It is helpful for me! Do you think it's possible that there's an issue with the environment setup?A version conflict occurred when I was configuring the requirements.txt.

ysq111333 avatar Apr 08 '25 05:04 ysq111333

Hi, we have updated the new requirement. Please see: https://github.com/JayLZhou/GraphRAG/blob/master/experiment.yml

JayLZhou avatar Apr 08 '25 06:04 JayLZhou