NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.

Open chericher opened this issue 10 months ago • 0 comments

Describe the issue "I have locally deployed bge-large-zh-v1.5 + qwen2.5-3B-Instruct + GraphRAG 1.1.2, using Python 3.10.12 and torch 2.5. When I run the graphrag index --root ./ command, I encounter the following error:"

15:05:20,104 graphrag.utils.storage INFO reading table from storage: create_final_relationships.parquet 15:05:20,108 graphrag.utils.storage INFO reading table from storage: create_final_entities.parquet 15:05:20,113 graphrag.utils.storage INFO reading table from storage: create_final_communities.parquet 15:05:20,130 graphrag.index.operations.summarize_communities.prepare_community_reports INFO Number of nodes at level=0 => 3 15:05:24,750 httpx INFO HTTP Request: POST http://localhost:8000/v1/chat/completions "HTTP/1.1 200 OK" 15:05:24,912 graphrag.utils.storage INFO reading table from storage: create_final_documents.parquet 15:05:24,917 graphrag.utils.storage INFO reading table from storage: create_final_relationships.parquet 15:05:24,922 graphrag.utils.storage INFO reading table from storage: create_final_text_units.parquet 15:05:24,927 graphrag.utils.storage INFO reading table from storage: create_final_entities.parquet 15:05:24,932 graphrag.utils.storage INFO reading table from storage: create_final_community_reports.parquet 15:05:24,942 graphrag.index.flows.generate_text_embeddings INFO Creating embeddings 15:05:24,942 graphrag.index.operations.embed_text.embed_text INFO using vector store lancedb with container_name default for embedding entity.description: default-entity-description 15:05:25,143 graphrag.index.operations.embed_text.strategies.openai INFO embedding 3 inputs via 3 snippets using 1 batches. max_batch_size=16, max_tokens=8191 15:05:25,391 httpx INFO HTTP Request: POST http://localhost:8150/v1/embeddings "HTTP/1.1 200 OK" 15:05:25,432 graphrag.index.operations.embed_text.embed_text INFO using vector store lancedb with container_name default for embedding text_unit.text: default-text_unit-text 15:05:25,436 graphrag.index.operations.embed_text.strategies.openai INFO embedding 1 inputs via 1 snippets using 1 batches. max_batch_size=16, max_tokens=8191 15:05:25,445 graphrag.index.operations.embed_text.embed_text INFO using vector store lancedb with container_name default for embedding community.full_content: default-community-full_content 15:05:25,448 graphrag.index.operations.embed_text.strategies.openai INFO embedding 1 inputs via 1 snippets using 1 batches. max_batch_size=16, max_tokens=8191 15:05:25,471 httpx INFO HTTP Request: POST http://localhost:8150/v1/embeddings "HTTP/1.1 400 Bad Request" 15:05:25,475 graphrag.callbacks.file_workflow_callbacks INFO Error Invoking LLM details={'prompt': ["# Family A\n\nThe community revolves around the key entities A, F, and M, who are related by familial ties. A is the child of F and M, and both F and M are parents of A. This family structure is central to the community's dynamics.\n\n## F and M as parents\n\nF and M are the parents of A, and their roles as parents are central to the community's structure. Their relationship with A is crucial in understanding the dynamics of the family. [Data: Entities (1, 2), Relationships (0, 1, +more)]\n\n## A as the child\n\nA is the child of F and M, and their relationship with A is central to the community's structure. A's role as a child is significant in understanding the family dynamics and potential conflicts. [Data: Entities (0), Relationships (0, 1, +more)]\n\n## F and M's combined degree\n\nF and M have a combined degree of 3, indicating their significant role in the community. Their relationship with A is crucial in understanding the family dynamics and potential conflicts. [Data: Entities (1, 2), Relationships (0, 1, +more)]\n\n## A's relationship with F and M\n\nA's relationship with F and M is central to the community's structure. Their roles as parents and the relationship with A are significant in understanding the family dynamics and potential conflicts. [Data: Entities (0), Relationships (0, 1, +more)]\n\n## Family structure\n\nThe family structure is central to the community's dynamics, with F and M as parents and A as the child. This structure is significant in understanding the potential for family disputes or conflicts. [Data: Entities (1, 2), Relationships (0, 1, +more)]"], 'kwargs': {}} 15:05:25,476 graphrag.index.run.run_workflows ERROR error running workflow generate_text_embeddings Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/graphrag/index/run/run_workflows.py", line 166, in _run_workflows result = await run_workflow( File "/usr/local/lib/python3.10/dist-packages/graphrag/index/workflows/generate_text_embeddings.py", line 45, in run_workflow await generate_text_embeddings( File "/usr/local/lib/python3.10/dist-packages/graphrag/index/flows/generate_text_embeddings.py", line 98, in generate_text_embeddings await _run_and_snapshot_embeddings( File "/usr/local/lib/python3.10/dist-packages/graphrag/index/flows/generate_text_embeddings.py", line 121, in _run_and_snapshot_embeddings data["embedding"] = await embed_text( File "/usr/local/lib/python3.10/dist-packages/graphrag/index/operations/embed_text/embed_text.py", line 89, in embed_text return await _text_embed_with_vector_store( File "/usr/local/lib/python3.10/dist-packages/graphrag/index/operations/embed_text/embed_text.py", line 179, in _text_embed_with_vector_store result = await strategy_exec( File "/usr/local/lib/python3.10/dist-packages/graphrag/index/operations/embed_text/strategies/openai.py", line 63, in run embeddings = await _execute(llm, text_batches, ticker, semaphore) File "/usr/local/lib/python3.10/dist-packages/graphrag/index/operations/embed_text/strategies/openai.py", line 103, in _execute results = await asyncio.gather(*futures) File "/usr/local/lib/python3.10/dist-packages/graphrag/index/operations/embed_text/strategies/openai.py", line 97, in embed chunk_embeddings = await llm(chunk) File "/usr/local/lib/python3.10/dist-packages/fnllm/base/base.py", line 112, in call return await self._invoke(prompt, **kwargs) File "/usr/local/lib/python3.10/dist-packages/fnllm/base/base.py", line 128, in _invoke return await self._decorated_target(prompt, **kwargs) File "/usr/local/lib/python3.10/dist-packages/fnllm/services/retryer.py", line 109, in invoke result = await execute_with_retry() File "/usr/local/lib/python3.10/dist-packages/fnllm/services/retryer.py", line 93, in execute_with_retry async for a in AsyncRetrying( File "/usr/local/lib/python3.10/dist-packages/tenacity/asyncio/init.py", line 166, in anext do = await self.iter(retry_state=self._retry_state) File "/usr/local/lib/python3.10/dist-packages/tenacity/asyncio/init.py", line 153, in iter result = await action(retry_state) File "/usr/local/lib/python3.10/dist-packages/tenacity/_utils.py", line 99, in inner return call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/tenacity/init.py", line 398, in self._add_action_func(lambda rs: rs.outcome.result()) File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/usr/local/lib/python3.10/dist-packages/fnllm/services/retryer.py", line 101, in execute_with_retry return await attempt() File "/usr/local/lib/python3.10/dist-packages/fnllm/services/retryer.py", line 78, in attempt return await delegate(prompt, **kwargs) File "/usr/local/lib/python3.10/dist-packages/fnllm/services/rate_limiter.py", line 70, in invoke result = await delegate(prompt, **args) File "/usr/local/lib/python3.10/dist-packages/fnllm/base/base.py", line 152, in _decorator_target output = await self._execute_llm(prompt, **kwargs) File "/usr/local/lib/python3.10/dist-packages/fnllm/openai/llm/embeddings.py", line 133, in _execute_llm response = await self._call_embeddings_or_cache( File "/usr/local/lib/python3.10/dist-packages/fnllm/openai/llm/embeddings.py", line 110, in _call_embeddings_or_cache return await self._cache.get_or_insert( File "/usr/local/lib/python3.10/dist-packages/fnllm/services/cache_interactor.py", line 50, in get_or_insert entry = await func() File "/usr/local/lib/python3.10/dist-packages/openai/resources/embeddings.py", line 236, in create return await self._post( File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1849, in post return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls) File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1543, in request return await self._request( File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1644, in _request raise self._make_status_error_from_response(err.response) from None openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.\n\n(The expanded size of the tensor (513) must match the existing size (512) at non-singleton dimension 1. Target sizes: [1, 513]. Tensor sizes: [1, 512])', 'code': 50001} 15:05:25,477 graphrag.callbacks.file_workflow_callbacks INFO Error running pipeline! details=None 15:05:25,554 graphrag.cli.index ERROR Errors occurred during the pipeline run, see logs for more details.

Steps to reproduce No response

GraphRAG Config Used

This config file contains required core defaults that must be set, along with a handful of common optional settings.

For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/

LLM settings

There are a number of settings to tune the threading and token limits for LLM calls - check the docs.

encoding_model: cl100k_base # this needs to be matched to your model!

llm: api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file type: openai_chat # or azure_openai_chat model: qwen3B model_supports_json: false # recommended if this is available for your model.

audience: "https://cognitiveservices.azure.com/.default"

api_base: http://localhost:8000/v1

api_version: 2024-02-15-preview

organization: <organization_id>

deployment_name: <azure_model_deployment_name>

parallelization: stagger: 0.3

num_threads: 50

async_mode: threaded # or asyncio

embeddings: async_mode: threaded # or asyncio vector_store: type: lancedb db_uri: 'output/lancedb' container_name: default overwrite: true llm: api_key: ${GRAPHRAG_API_KEY} type: openai_embedding # or azure_openai_embedding model: gpt-4 api_base: http://localhost:8150/v1 # api_version: 2024-02-15-preview # audience: "https://cognitiveservices.azure.com/.default" # organization: <organization_id> # deployment_name: <azure_model_deployment_name>

Input settings

input: type: file # or blob file_type: text # or csv base_dir: "input" file_encoding: utf-8 file_pattern: ".*\.txt$"

chunks: size: 1200 overlap: 100 group_by_columns: [id]

Storage settings

If blob storage is specified in the following four sections,

connection_string and container_name must be provided

cache: type: file # one of [blob, cosmosdb, file] base_dir: "cache"

reporting: type: file # or console, blob base_dir: "logs"

storage: type: file # one of [blob, cosmosdb, file] base_dir: "output"

only turn this on if running `graphrag index` with custom settings

we normally use `graphrag update` with the defaults

update_index_storage:

type: file # or blob

base_dir: "update_output"

Workflow settings

skip_workflows: []

entity_extraction: prompt: "prompts/entity_extraction.txt" entity_types: [organization,person,geo,event] max_gleanings: 1

summarize_descriptions: prompt: "prompts/summarize_descriptions.txt" max_length: 1000

claim_extraction: enabled: false prompt: "prompts/claim_extraction.txt" description: "Any claims or facts that could be relevant to information discovery." max_gleanings: 1

community_reports: prompt: "prompts/community_report.txt" max_length: 1000 max_input_length: 4000

cluster_graph: max_cluster_size: 10

embed_graph: enabled: false # if true, will generate node2vec embeddings for nodes

umap: enabled: false # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)

snapshots: graphml: false embeddings: false transient: false

Query settings

The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.

See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query

local_search: prompt: "prompts/local_search_system_prompt.txt"

global_search: map_prompt: "prompts/global_search_map_system_prompt.txt" reduce_prompt: "prompts/global_search_reduce_system_prompt.txt" knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"

drift_search: prompt: "prompts/drift_search_system_prompt.txt"

basic_search: prompt: "prompts/basic_search_system_prompt.txt" Logs and screenshots (graphragtest) root@cdd2b6557714:/home/graphragtest# graphrag index --root ./

Logging enabled at /home/graphragtest/logs/indexing-engine.log Running standard indexing. 🚀 create_base_text_units id text document_ids n_tokens 0 b53ef702af00f35578b1cdbf74474a32866bd5bb89a30a... A的爸爸叫F。\n\nA的妈妈叫M。\n [10ae1eaa0dc9f3bd3cbbfc0ff5d391e0a4eb7ed2d604d... 22 🚀 create_final_documents id human_readable_id title text text_unit_ids 0 10ae1eaa0dc9f3bd3cbbfc0ff5d391e0a4eb7ed2d604dd... 1 report.txt A的爸爸叫F。\n\nA的妈妈叫M。\n [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30... 🚀 extract_graph None 🚀 compute_communities level community parent title 0 0 0 -1 A 0 0 0 -1 F 0 0 0 -1 M 🚀 create_final_entities id human_readable_id title type description text_unit_ids 0 c137ae10-4252-48da-894b-ca30f7aef684 0 A PERSON A is a person [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30... 1 5650c001-6bb1-4868-bbf9-08a8a3f95892 1 F PERSON F is the father of A [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30... 2 b447f3a1-29d2-4130-b586-da16499a79a2 2 M PERSON M is the mother of A [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30... 🚀 create_final_relationships id human_readable_id source target description weight combined_degree text_unit_ids 0 69ebb419-9b02-4ca0-8d42-12335857355f 0 A F A's father is F 2.0 3 [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30... 1 4860e8a2-b30f-4615-b127-64ddc3617535 1 A M A's mother is M 2.0 3 [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30... 🚀 create_final_nodes id human_readable_id title community level degree x y 0 c137ae10-4252-48da-894b-ca30f7aef684 0 A 0 0 2 0 0 1 5650c001-6bb1-4868-bbf9-08a8a3f95892 1 F 0 0 1 0 0 2 b447f3a1-29d2-4130-b586-da16499a79a2 2 M 0 0 1 0 0 🚀 create_final_communities id human_readable_id community ... text_unit_ids period size 0 a48137e0-b5f5-4297-9919-50fb59ef270f 0 0 ... [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30... 2025-01-10 3

[1 rows x 11 columns] 🚀 create_final_text_units id ... relationship_ids 0 b53ef702af00f35578b1cdbf74474a32866bd5bb89a30a... ... [69ebb419-9b02-4ca0-8d42-12335857355f, 4860e8a...

[1 rows x 7 columns] 🚀 create_final_community_reports id human_readable_id community ... full_content_json period size 0 54b5f0c3db3343f7a348d43a0ef6f086 0 0 ... {\n "title": "Family A",\n "summary": "T... 2025-01-10 3

[1 rows x 14 columns] ❌ generate_text_embeddings None ⠼ GraphRAG Indexer ├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_base_text_units ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_final_documents ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── extract_graph ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── compute_communities ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_final_entities ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_final_relationships ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_final_nodes ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_final_communities ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_final_text_units ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_final_community_reports ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ❌ Errors occurred during the pipeline run, see logs for more details.

Additional Information GraphRAG Version:1.1.2 Operating System:linux Python Version:3.10.12 Related Issues:

Jan 10 '25 09:01 chericher

FastChat FastChat copied to clipboard

NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.

This config file contains required core defaults that must be set, along with a handful of common optional settings.

For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/

LLM settings

There are a number of settings to tune the threading and token limits for LLM calls - check the docs.

audience: "https://cognitiveservices.azure.com/.default"

api_version: 2024-02-15-preview

organization: <organization_id>

deployment_name: <azure_model_deployment_name>

num_threads: 50

Input settings

Storage settings

If blob storage is specified in the following four sections,

connection_string and container_name must be provided

only turn this on if running graphrag index with custom settings

we normally use graphrag update with the defaults

type: file # or blob

base_dir: "update_output"

Workflow settings

Query settings

The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.

See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query

FastChat
FastChat copied to clipboard

only turn this on if running `graphrag index` with custom settings

we normally use `graphrag update` with the defaults