graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

[Bug]: Multi-search failed

Open arkonchen opened this issue 9 months ago • 4 comments

Do you need to file an issue?

  • [ ] I have searched the existing issues and this bug is not already filed.
  • [ ] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • [ ] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

当我在使用graphrag 2.0.0版本测试multi-search功能时,会出现异常

Image

Steps to reproduce

No response

Expected Behavior

No response

GraphRAG Config Used

# Paste your config here
vector_store:
  first_index:
    type: lancedb
    db_uri: first_index/output/lancedb
    container_name: default
    overwrite: True
  second_index:
    type: lancedb
    db_uri: second_index/output/lancedb
    container_name: default
    overwrite: True

embed_text:
  model_id: default_embedding_model
  vector_store_id: default_vector_store

### Input settings ###

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$$"

chunks:
  size: 1200
  overlap: 100
  group_by_columns: [id]

### Output settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided

cache:
  type: file # [file, blob, cosmosdb]
  base_dir: "cache"

reporting:
  type: file # [file, blob, cosmosdb]
  base_dir: "logs"

output:
  type: file # [file, blob, cosmosdb]
  base_dir: "output"

outputs:
  first_index:
    type: file # [file, blob, cosmosdb]
    base_dir: "first_index/output"
  second_index:
    type: file # [file, blob, cosmosdb]
    base_dir: "second_index/output"

### Workflow settings ###

extract_graph:
  model_id: default_chat_model
  prompt: "prompts/extract_graph.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1

summarize_descriptions:
  model_id: default_chat_model
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

extract_graph_nlp:
  text_analyzer:
    extractor_type: regex_english # [regex_english, syntactic_parser, cfg]

extract_claims:
  enabled: false
  model_id: default_chat_model
  prompt: "prompts/extract_claims.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  model_id: default_chat_model
  graph_prompt: "prompts/community_report_graph.txt"
  text_prompt: "prompts/community_report_text.txt"
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)

snapshots:
  graphml: false
  embeddings: false

### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query

local_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: "prompts/local_search_system_prompt.txt"

global_search:
  chat_model_id: default_chat_model
  map_prompt: "prompts/global_search_map_system_prompt.txt"
  reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
  knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"

drift_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: "prompts/drift_search_system_prompt.txt"
  reduce_prompt: "prompts/drift_search_reduce_prompt.txt"

basic_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: "prompts/basic_search_system_prompt.txt"

Logs and screenshots

Traceback (most recent call last): File "/Users/bytedance/miniconda3/envs/graphrag_test/lib/python3.12/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) RuntimeError: cannot enter context: <_contextvars.Context object at 0x14e49f640> is already entered Traceback (most recent call last): File "/Applications/PyCharm.app/Contents/plugins/python-ce/helpers/pydev/pydevd.py", line 1587, in _exec runpy._run_module_as_main(module_name, alter_argv=False) File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/Users/bytedance/work/tool/graphrag/graphrag/main.py", line 8, in app(prog_name="graphrag") File "/Users/bytedance/miniconda3/envs/graphrag_test/lib/python3.12/site-packages/typer/main.py", line 339, in call raise e File "/Users/bytedance/miniconda3/envs/graphrag_test/lib/python3.12/site-packages/typer/main.py", line 322, in call return get_command(self)(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/bytedance/miniconda3/envs/graphrag_test/lib/python3.12/site-packages/click/core.py", line 1161, in call return self.main(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/bytedance/miniconda3/envs/graphrag_test/lib/python3.12/site-packages/typer/core.py", line 740, in main return _main( ^^^^^^ File "/Users/bytedance/miniconda3/envs/graphrag_test/lib/python3.12/site-packages/typer/core.py", line 195, in _main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/Users/bytedance/miniconda3/envs/graphrag_test/lib/python3.12/site-packages/click/core.py", line 1697, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/bytedance/miniconda3/envs/graphrag_test/lib/python3.12/site-packages/click/core.py", line 1443, in invoke return ctx.invoke(self.callback, **ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/bytedance/miniconda3/envs/graphrag_test/lib/python3.12/site-packages/click/core.py", line 788, in invoke return __callback(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/bytedance/miniconda3/envs/graphrag_test/lib/python3.12/site-packages/typer/main.py", line 697, in wrapper return callback(**use_params) ^^^^^^^^^^^^^^^^^^^^^^ File "/Users/bytedance/work/tool/graphrag/graphrag/cli/main.py", line 432, in _query_cli run_local_search( File "/Users/bytedance/work/tool/graphrag/graphrag/cli/query.py", line 191, in run_local_search response, context_data = asyncio.run( ^^^^^^^^^^^^ File "/Applications/PyCharm.app/Contents/plugins/python/helpers-pro/pydevd_asyncio/pydevd_nest_asyncio.py", line 138, in run return loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Applications/PyCharm.app/Contents/plugins/python/helpers-pro/pydevd_asyncio/pydevd_nest_asyncio.py", line 243, in run_until_complete return f.result() ^^^^^^^^^^ File "/Users/bytedance/miniconda3/envs/graphrag_test/lib/python3.12/asyncio/futures.py", line 202, in result raise self._exception.with_traceback(self._exception_tb) File "/Users/bytedance/miniconda3/envs/graphrag_test/lib/python3.12/asyncio/tasks.py", line 314, in __step_run_and_handle_result result = coro.send(None) ^^^^^^^^^^^^^^^ File "/Users/bytedance/miniconda3/envs/graphrag_test/lib/python3.12/site-packages/pydantic/_internal/_validate_call.py", line 33, in wrapper_function return await wrapper(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/bytedance/work/tool/graphrag/graphrag/api/query.py", line 688, in multi_index_local_search context = update_context_data(result[1], links) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/bytedance/work/tool/graphrag/graphrag/utils/api.py", line 175, in update_context_data {k: entry[k] for k in entry}, ~~~~~^^^ TypeError: string indices must be integers, not 'str'

Additional Information

  • GraphRAG Version: 2.0.0
  • Operating System:
  • Python Version: 3.12
  • Related Issues:

arkonchen avatar Mar 05 '25 03:03 arkonchen

+1

ArianeFire avatar Mar 07 '25 13:03 ArianeFire

+1 for me using version 2.1.0

xtrycatchx avatar Apr 07 '25 16:04 xtrycatchx

我也遇到了,最后发现是graphrag/utils/api.py文件中的update_context_data 方法里以下代码(应该所有的case都有问题)

for entry in context_data[key]

改成以下代码试试: for entry in context_data[key].to_dict(orient='records')

vforkk avatar May 11 '25 03:05 vforkk

In fact, the same error occurred with all keys, so I resolved it by modifying the update_context_data function as follows. I hope this helps.

def update_context_data(
    context_data: Any,
    links: dict[str, Any],
) -> Any:
    """
    Update context data with the links dict so that it contains both the index name and community id.

    Parameters
    ----------
    - context_data (str | list[pd.DataFrame] | dict[str, pd.DataFrame]): The context data to update.
    - links (dict[str, Any]): A dictionary of links to the original dataframes.

    Returns
    -------
    str | list[pd.DataFrame] | dict[str, pd.DataFrame]: The updated context data.
    """
    import pandas as pd

    updated_context_data = {}
    for key in context_data:
        data = context_data[key]
        entries = data.to_dict(orient="records")

        updated_entry = []

        if key == "reports":
            updated_entry = [
                dict(
                    entry,
                    index_name=links["community_reports"][int(entry["id"])]["index_name"],
                    index_id=links["community_reports"][int(entry["id"])]["id"],
                )
                for entry in entries
            ]
        elif key == "entities":
            updated_entry = [
                dict(
                    entry,
                    entity=entry["entity"].split("-")[0],
                    index_name=links["entities"][int(entry["id"])]["index_name"],
                    index_id=links["entities"][int(entry["id"])]["id"],
                )
                for entry in entries
            ]
        elif key == "relationships":
            updated_entry = [
                dict(
                    entry,
                    source=entry["source"].split("-")[0],
                    target=entry["target"].split("-")[0],
                    index_name=links["relationships"][int(entry["id"])]["index_name"],
                    index_id=links["relationships"][int(entry["id"])]["id"],
                )
                for entry in entries
            ]
        elif key == "claims":
            updated_entry = [
                dict(
                    entry,
                    entity=entry["entity"].split("-")[0],
                    index_name=links["covariates"][int(entry["id"])]["index_name"],
                    index_id=links["covariates"][int(entry["id"])]["id"],
                )
                for entry in entries
            ]
        elif key == "sources":
            updated_entry = [
                dict(
                    entry,
                    index_name=links["text_units"][int(entry["id"])]["index_name"],
                    index_id=links["text_units"][int(entry["id"])]["id"],
                )
                for entry in entries
            ]
        updated_context_data[key] = updated_entry

    return updated_context_data

takumint78 avatar May 16 '25 07:05 takumint78