graphrag
graphrag copied to clipboard
[Bug]: Multi-search failed
Do you need to file an issue?
- [ ] I have searched the existing issues and this bug is not already filed.
- [ ] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
- [ ] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the bug
当我在使用graphrag 2.0.0版本测试multi-search功能时,会出现异常
Steps to reproduce
No response
Expected Behavior
No response
GraphRAG Config Used
# Paste your config here
vector_store:
first_index:
type: lancedb
db_uri: first_index/output/lancedb
container_name: default
overwrite: True
second_index:
type: lancedb
db_uri: second_index/output/lancedb
container_name: default
overwrite: True
embed_text:
model_id: default_embedding_model
vector_store_id: default_vector_store
### Input settings ###
input:
type: file # or blob
file_type: text # or csv
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\\.txt$$"
chunks:
size: 1200
overlap: 100
group_by_columns: [id]
### Output settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided
cache:
type: file # [file, blob, cosmosdb]
base_dir: "cache"
reporting:
type: file # [file, blob, cosmosdb]
base_dir: "logs"
output:
type: file # [file, blob, cosmosdb]
base_dir: "output"
outputs:
first_index:
type: file # [file, blob, cosmosdb]
base_dir: "first_index/output"
second_index:
type: file # [file, blob, cosmosdb]
base_dir: "second_index/output"
### Workflow settings ###
extract_graph:
model_id: default_chat_model
prompt: "prompts/extract_graph.txt"
entity_types: [organization,person,geo,event]
max_gleanings: 1
summarize_descriptions:
model_id: default_chat_model
prompt: "prompts/summarize_descriptions.txt"
max_length: 500
extract_graph_nlp:
text_analyzer:
extractor_type: regex_english # [regex_english, syntactic_parser, cfg]
extract_claims:
enabled: false
model_id: default_chat_model
prompt: "prompts/extract_claims.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 1
community_reports:
model_id: default_chat_model
graph_prompt: "prompts/community_report_graph.txt"
text_prompt: "prompts/community_report_text.txt"
max_length: 2000
max_input_length: 8000
cluster_graph:
max_cluster_size: 10
embed_graph:
enabled: false # if true, will generate node2vec embeddings for nodes
umap:
enabled: false # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)
snapshots:
graphml: false
embeddings: false
### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query
local_search:
chat_model_id: default_chat_model
embedding_model_id: default_embedding_model
prompt: "prompts/local_search_system_prompt.txt"
global_search:
chat_model_id: default_chat_model
map_prompt: "prompts/global_search_map_system_prompt.txt"
reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"
drift_search:
chat_model_id: default_chat_model
embedding_model_id: default_embedding_model
prompt: "prompts/drift_search_system_prompt.txt"
reduce_prompt: "prompts/drift_search_reduce_prompt.txt"
basic_search:
chat_model_id: default_chat_model
embedding_model_id: default_embedding_model
prompt: "prompts/basic_search_system_prompt.txt"
Logs and screenshots
Traceback (most recent call last):
File "/Users/bytedance/miniconda3/envs/graphrag_test/lib/python3.12/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
RuntimeError: cannot enter context: <_contextvars.Context object at 0x14e49f640> is already entered
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/plugins/python-ce/helpers/pydev/pydevd.py", line 1587, in _exec
runpy._run_module_as_main(module_name, alter_argv=False)
File "
Additional Information
- GraphRAG Version: 2.0.0
- Operating System:
- Python Version: 3.12
- Related Issues:
+1
+1 for me using version 2.1.0
我也遇到了,最后发现是graphrag/utils/api.py文件中的update_context_data 方法里以下代码(应该所有的case都有问题)
for entry in context_data[key]
改成以下代码试试:
for entry in context_data[key].to_dict(orient='records')
In fact, the same error occurred with all keys, so I resolved it by modifying the update_context_data function as follows. I hope this helps.
def update_context_data(
context_data: Any,
links: dict[str, Any],
) -> Any:
"""
Update context data with the links dict so that it contains both the index name and community id.
Parameters
----------
- context_data (str | list[pd.DataFrame] | dict[str, pd.DataFrame]): The context data to update.
- links (dict[str, Any]): A dictionary of links to the original dataframes.
Returns
-------
str | list[pd.DataFrame] | dict[str, pd.DataFrame]: The updated context data.
"""
import pandas as pd
updated_context_data = {}
for key in context_data:
data = context_data[key]
entries = data.to_dict(orient="records")
updated_entry = []
if key == "reports":
updated_entry = [
dict(
entry,
index_name=links["community_reports"][int(entry["id"])]["index_name"],
index_id=links["community_reports"][int(entry["id"])]["id"],
)
for entry in entries
]
elif key == "entities":
updated_entry = [
dict(
entry,
entity=entry["entity"].split("-")[0],
index_name=links["entities"][int(entry["id"])]["index_name"],
index_id=links["entities"][int(entry["id"])]["id"],
)
for entry in entries
]
elif key == "relationships":
updated_entry = [
dict(
entry,
source=entry["source"].split("-")[0],
target=entry["target"].split("-")[0],
index_name=links["relationships"][int(entry["id"])]["index_name"],
index_id=links["relationships"][int(entry["id"])]["id"],
)
for entry in entries
]
elif key == "claims":
updated_entry = [
dict(
entry,
entity=entry["entity"].split("-")[0],
index_name=links["covariates"][int(entry["id"])]["index_name"],
index_id=links["covariates"][int(entry["id"])]["id"],
)
for entry in entries
]
elif key == "sources":
updated_entry = [
dict(
entry,
index_name=links["text_units"][int(entry["id"])]["index_name"],
index_id=links["text_units"][int(entry["id"])]["id"],
)
for entry in entries
]
updated_context_data[key] = updated_entry
return updated_context_data