[Issue]: <NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.>
Do you need to file an issue?
- [x] I have searched the existing issues and this bug is not already filed.
- [ ] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
- [ ] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the issue
"I have locally deployed bge-large-zh-v1.5 + qwen2.5-3B-Instruct + GraphRAG 1.1.2, using Python 3.10.12 and torch 2.5. When I run the graphrag index --root ./ command, I encounter the following error:"
15:05:20,104 graphrag.utils.storage INFO reading table from storage: create_final_relationships.parquet
15:05:20,108 graphrag.utils.storage INFO reading table from storage: create_final_entities.parquet
15:05:20,113 graphrag.utils.storage INFO reading table from storage: create_final_communities.parquet
15:05:20,130 graphrag.index.operations.summarize_communities.prepare_community_reports INFO Number of nodes at level=0 => 3
15:05:24,750 httpx INFO HTTP Request: POST http://localhost:8000/v1/chat/completions "HTTP/1.1 200 OK"
15:05:24,912 graphrag.utils.storage INFO reading table from storage: create_final_documents.parquet
15:05:24,917 graphrag.utils.storage INFO reading table from storage: create_final_relationships.parquet
15:05:24,922 graphrag.utils.storage INFO reading table from storage: create_final_text_units.parquet
15:05:24,927 graphrag.utils.storage INFO reading table from storage: create_final_entities.parquet
15:05:24,932 graphrag.utils.storage INFO reading table from storage: create_final_community_reports.parquet
15:05:24,942 graphrag.index.flows.generate_text_embeddings INFO Creating embeddings
15:05:24,942 graphrag.index.operations.embed_text.embed_text INFO using vector store lancedb with container_name default for embedding entity.description: default-entity-description
15:05:25,143 graphrag.index.operations.embed_text.strategies.openai INFO embedding 3 inputs via 3 snippets using 1 batches. max_batch_size=16, max_tokens=8191
15:05:25,391 httpx INFO HTTP Request: POST http://localhost:8150/v1/embeddings "HTTP/1.1 200 OK"
15:05:25,432 graphrag.index.operations.embed_text.embed_text INFO using vector store lancedb with container_name default for embedding text_unit.text: default-text_unit-text
15:05:25,436 graphrag.index.operations.embed_text.strategies.openai INFO embedding 1 inputs via 1 snippets using 1 batches. max_batch_size=16, max_tokens=8191
15:05:25,445 graphrag.index.operations.embed_text.embed_text INFO using vector store lancedb with container_name default for embedding community.full_content: default-community-full_content
15:05:25,448 graphrag.index.operations.embed_text.strategies.openai INFO embedding 1 inputs via 1 snippets using 1 batches. max_batch_size=16, max_tokens=8191
15:05:25,471 httpx INFO HTTP Request: POST http://localhost:8150/v1/embeddings "HTTP/1.1 400 Bad Request"
15:05:25,475 graphrag.callbacks.file_workflow_callbacks INFO Error Invoking LLM details={'prompt': ["# Family A\n\nThe community revolves around the key entities A, F, and M, who are related by familial ties. A is the child of F and M, and both F and M are parents of A. This family structure is central to the community's dynamics.\n\n## F and M as parents\n\nF and M are the parents of A, and their roles as parents are central to the community's structure. Their relationship with A is crucial in understanding the dynamics of the family. [Data: Entities (1, 2), Relationships (0, 1, +more)]\n\n## A as the child\n\nA is the child of F and M, and their relationship with A is central to the community's structure. A's role as a child is significant in understanding the family dynamics and potential conflicts. [Data: Entities (0), Relationships (0, 1, +more)]\n\n## F and M's combined degree\n\nF and M have a combined degree of 3, indicating their significant role in the community. Their relationship with A is crucial in understanding the family dynamics and potential conflicts. [Data: Entities (1, 2), Relationships (0, 1, +more)]\n\n## A's relationship with F and M\n\nA's relationship with F and M is central to the community's structure. Their roles as parents and the relationship with A are significant in understanding the family dynamics and potential conflicts. [Data: Entities (0), Relationships (0, 1, +more)]\n\n## Family structure\n\nThe family structure is central to the community's dynamics, with F and M as parents and A as the child. This structure is significant in understanding the potential for family disputes or conflicts. [Data: Entities (1, 2), Relationships (0, 1, +more)]"], 'kwargs': {}}
15:05:25,476 graphrag.index.run.run_workflows ERROR error running workflow generate_text_embeddings
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/run/run_workflows.py", line 166, in _run_workflows
result = await run_workflow(
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/workflows/generate_text_embeddings.py", line 45, in run_workflow
await generate_text_embeddings(
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/flows/generate_text_embeddings.py", line 98, in generate_text_embeddings
await _run_and_snapshot_embeddings(
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/flows/generate_text_embeddings.py", line 121, in _run_and_snapshot_embeddings
data["embedding"] = await embed_text(
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/operations/embed_text/embed_text.py", line 89, in embed_text
return await _text_embed_with_vector_store(
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/operations/embed_text/embed_text.py", line 179, in _text_embed_with_vector_store
result = await strategy_exec(
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/operations/embed_text/strategies/openai.py", line 63, in run
embeddings = await _execute(llm, text_batches, ticker, semaphore)
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/operations/embed_text/strategies/openai.py", line 103, in _execute
results = await asyncio.gather(*futures)
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/operations/embed_text/strategies/openai.py", line 97, in embed
chunk_embeddings = await llm(chunk)
File "/usr/local/lib/python3.10/dist-packages/fnllm/base/base.py", line 112, in call
return await self._invoke(prompt, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/fnllm/base/base.py", line 128, in _invoke
return await self._decorated_target(prompt, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/fnllm/services/retryer.py", line 109, in invoke
result = await execute_with_retry()
File "/usr/local/lib/python3.10/dist-packages/fnllm/services/retryer.py", line 93, in execute_with_retry
async for a in AsyncRetrying(
File "/usr/local/lib/python3.10/dist-packages/tenacity/asyncio/init.py", line 166, in anext
do = await self.iter(retry_state=self._retry_state)
File "/usr/local/lib/python3.10/dist-packages/tenacity/asyncio/init.py", line 153, in iter
result = await action(retry_state)
File "/usr/local/lib/python3.10/dist-packages/tenacity/_utils.py", line 99, in inner
return call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/tenacity/init.py", line 398, in
Steps to reproduce
No response
GraphRAG Config Used
### This config file contains required core defaults that must be set, along with a handful of common optional settings.
### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/
### LLM settings ###
## There are a number of settings to tune the threading and token limits for LLM calls - check the docs.
encoding_model: cl100k_base # this needs to be matched to your model!
llm:
api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file
type: openai_chat # or azure_openai_chat
model: qwen3B
model_supports_json: false # recommended if this is available for your model.
# audience: "https://cognitiveservices.azure.com/.default"
api_base: http://localhost:8000/v1
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
parallelization:
stagger: 0.3
# num_threads: 50
async_mode: threaded # or asyncio
embeddings:
async_mode: threaded # or asyncio
vector_store:
type: lancedb
db_uri: 'output/lancedb'
container_name: default
overwrite: true
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding # or azure_openai_embedding
model: gpt-4
api_base: http://localhost:8150/v1
# api_version: 2024-02-15-preview
# audience: "https://cognitiveservices.azure.com/.default"
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
### Input settings ###
input:
type: file # or blob
file_type: text # or csv
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\\.txt$"
chunks:
size: 1200
overlap: 100
group_by_columns: [id]
### Storage settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided
cache:
type: file # one of [blob, cosmosdb, file]
base_dir: "cache"
reporting:
type: file # or console, blob
base_dir: "logs"
storage:
type: file # one of [blob, cosmosdb, file]
base_dir: "output"
## only turn this on if running `graphrag index` with custom settings
## we normally use `graphrag update` with the defaults
update_index_storage:
# type: file # or blob
# base_dir: "update_output"
### Workflow settings ###
skip_workflows: []
entity_extraction:
prompt: "prompts/entity_extraction.txt"
entity_types: [organization,person,geo,event]
max_gleanings: 1
summarize_descriptions:
prompt: "prompts/summarize_descriptions.txt"
max_length: 1000
claim_extraction:
enabled: false
prompt: "prompts/claim_extraction.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 1
community_reports:
prompt: "prompts/community_report.txt"
max_length: 1000
max_input_length: 4000
cluster_graph:
max_cluster_size: 10
embed_graph:
enabled: false # if true, will generate node2vec embeddings for nodes
umap:
enabled: false # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)
snapshots:
graphml: false
embeddings: false
transient: false
### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query
local_search:
prompt: "prompts/local_search_system_prompt.txt"
global_search:
map_prompt: "prompts/global_search_map_system_prompt.txt"
reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"
drift_search:
prompt: "prompts/drift_search_system_prompt.txt"
basic_search:
prompt: "prompts/basic_search_system_prompt.txt"
Logs and screenshots
(graphragtest) root@cdd2b6557714:/home/graphragtest# graphrag index --root ./
Logging enabled at /home/graphragtest/logs/indexing-engine.log Running standard indexing. π create_base_text_units id text document_ids n_tokens 0 b53ef702af00f35578b1cdbf74474a32866bd5bb89a30a... AηηΈηΈε«Fγ\n\nAηε¦ε¦ε«Mγ\n [10ae1eaa0dc9f3bd3cbbfc0ff5d391e0a4eb7ed2d604d... 22 π create_final_documents id human_readable_id title text text_unit_ids 0 10ae1eaa0dc9f3bd3cbbfc0ff5d391e0a4eb7ed2d604dd... 1 report.txt AηηΈηΈε«Fγ\n\nAηε¦ε¦ε«Mγ\n [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30... π extract_graph None π compute_communities level community parent title 0 0 0 -1 A 0 0 0 -1 F 0 0 0 -1 M π create_final_entities id human_readable_id title type description text_unit_ids 0 c137ae10-4252-48da-894b-ca30f7aef684 0 A PERSON A is a person [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30... 1 5650c001-6bb1-4868-bbf9-08a8a3f95892 1 F PERSON F is the father of A [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30... 2 b447f3a1-29d2-4130-b586-da16499a79a2 2 M PERSON M is the mother of A [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30... π create_final_relationships id human_readable_id source target description weight combined_degree text_unit_ids 0 69ebb419-9b02-4ca0-8d42-12335857355f 0 A F A's father is F 2.0 3 [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30... 1 4860e8a2-b30f-4615-b127-64ddc3617535 1 A M A's mother is M 2.0 3 [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30... π create_final_nodes id human_readable_id title community level degree x y 0 c137ae10-4252-48da-894b-ca30f7aef684 0 A 0 0 2 0 0 1 5650c001-6bb1-4868-bbf9-08a8a3f95892 1 F 0 0 1 0 0 2 b447f3a1-29d2-4130-b586-da16499a79a2 2 M 0 0 1 0 0 π create_final_communities id human_readable_id community ... text_unit_ids period size 0 a48137e0-b5f5-4297-9919-50fb59ef270f 0 0 ... [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30... 2025-01-10 3
[1 rows x 11 columns] π create_final_text_units id ... relationship_ids 0 b53ef702af00f35578b1cdbf74474a32866bd5bb89a30a... ... [69ebb419-9b02-4ca0-8d42-12335857355f, 4860e8a...
[1 rows x 7 columns] π create_final_community_reports id human_readable_id community ... full_content_json period size 0 54b5f0c3db3343f7a348d43a0ef6f086 0 0 ... {\n "title": "Family A",\n "summary": "T... 2025-01-10 3
[1 rows x 14 columns] β generate_text_embeddings None β Ό GraphRAG Indexer βββ Loading Input (text) - 1 files loaded (0 filtered) ββββββββββββββββββββββββββββββββββββββββ 100% 0:00:00 0:00:00 βββ create_base_text_units ββββββββββββββββββββββββββββββββββββββββ 100% 0:00:00 0:00:00 βββ create_final_documents ββββββββββββββββββββββββββββββββββββββββ 100% 0:00:00 0:00:00 βββ extract_graph ββββββββββββββββββββββββββββββββββββββββ 100% 0:00:00 0:00:00 βββ compute_communities ββββββββββββββββββββββββββββββββββββββββ 100% 0:00:00 0:00:00 βββ create_final_entities ββββββββββββββββββββββββββββββββββββββββ 100% 0:00:00 0:00:00 βββ create_final_relationships ββββββββββββββββββββββββββββββββββββββββ 100% 0:00:00 0:00:00 βββ create_final_nodes ββββββββββββββββββββββββββββββββββββββββ 100% 0:00:00 0:00:00 βββ create_final_communities ββββββββββββββββββββββββββββββββββββββββ 100% 0:00:00 0:00:00 βββ create_final_text_units ββββββββββββββββββββββββββββββββββββββββ 100% 0:00:00 0:00:00 βββ create_final_community_reports ββββββββββββββββββββββββββββββββββββββββ 100% 0:00:00 0:00:00 β Errors occurred during the pipeline run, see logs for more details.
Additional Information
- GraphRAG Version:1.1.2
- Operating System:linux
- Python Version:3.10.12
- Related Issues:
Is not the true reason why run error ? openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.\n\n(The expanded size of the tensor (513) must match the existing size (512) at non-singleton dimension 1. Target sizes: [1, 513]. Tensor sizes: [1, 512])', 'code': 50001}
it seems that tensor is too long. but my doc length is below 1000 tokens. i don`t know how to fix it
Routing to #657