[Bug]:Embedding not happening - constant errors
Do you need to file an issue?
- [x] I have searched the existing issues and this bug is not already filed.
- [x] I believe this is a legitimate bug, not just a question or feature request.
Describe the bug
Latest v.1.4.9.4 Using Ollama with Qwen3-embedding-8b q8-0
Cannot get any embedding on MS Word and pptx files:
Steps to reproduce
No response
Expected Behavior
No response
LightRAG Config Used
Paste your config here
This is sample file of .env
###########################
Server Configuration
########################### HOST=0.0.0.0 PORT=9621 WEBUI_TITLE='Arch Graph KB' WEBUI_DESCRIPTION="Simple and Fast Graph Based RAG System" WORKERS=2
gunicorn worker timeout(as default LLM request timeout if LLM_TIMEOUT is not set)
TIMEOUT=350
CORS_ORIGINS=http://localhost:3000,http://localhost:8080
Optional SSL Configuration
SSL=true
SSL_CERTFILE=/path/to/cert.pem
SSL_KEYFILE=/path/to/key.pem
Directory Configuration (defaults to current working directory)
Default value is ./inputs and ./rag_storage
INPUT_DIR=<absolute_path_for_doc_input_dir>
WORKING_DIR=<absolute_path_for_working_dir>
Tiktoken cache directory (Store cached files in this folder for offline deployment)
TIKTOKEN_CACHE_DIR=./temp/tiktoken
Ollama Emulating Model and Tag
OLLAMA_EMULATING_MODEL_NAME=lightrag
OLLAMA_EMULATING_MODEL_TAG=latest
Max nodes return from graph retrieval in webui
MAX_GRAPH_NODES=1000
Logging level
LOG_LEVEL=INFO
VERBOSE=False
LOG_MAX_BYTES=10485760
LOG_BACKUP_COUNT=5
Logfile location (defaults to current working directory)
LOG_DIR=/path/to/log/directory
How to control the context length sent to LLM:
MAX_ENTITY_TOKENS + MAX_RELATION_TOKENS < MAX_TOTAL_TOKENS
Chunk_Tokens = MAX_TOTAL_TOKENS - Actual_Entity_Tokens - Actual_Relation_Tokens
######################################################################################
LLM response cache for query (Not valid for streaming response)
ENABLE_LLM_CACHE=true COSINE_THRESHOLD=0.2
Number of entities or relations retrieved from KG
TOP_K=40
Maximum number or chunks for naive vector search
CHUNK_TOP_K=20
control the actual entities send to LLM
MAX_ENTITY_TOKENS=6000
control the actual relations send to LLM
MAX_RELATION_TOKENS=8000
control the maximum tokens send to LLM (include entities, relations and chunks)
MAX_TOTAL_TOKENS=30000
maximum number of related chunks per source entity or relation
The chunk picker uses this value to determine the total number of chunks selected from KG(knowledge graph)
Higher values increase re-ranking time
#RELATED_CHUNK_NUMBER=5
chunk selection strategies
VECTOR: Pick KG chunks by vector similarity, delivered chunks to the LLM aligning more closely with naive retrieval
WEIGHT: Pick KG chunks by entity and chunk weight, delivered more solely KG related chunks to the LLM
If reranking is enabled, the impact of chunk selection strategies will be diminished.
#KG_CHUNK_PICK_METHOD=WEIGHT
#########################################################
Reranking configuration
RERANK_BINDING type: null, cohere, jina, aliyun
For rerank model deployed by vLLM use cohere binding
######################################################### RERANK_BINDING=null
Enable rerank by default in query params when RERANK_BINDING is not null
RERANK_BY_DEFAULT=True
rerank score chunk filter(set to 0.0 to keep all chunks, 0.6 or above if LLM is not strong enough)
MIN_RERANK_SCORE=0.0
For local deployment with vLLM
#RERANK_MODEL=gte-multilingual-reranker-base #RERANK_BINDING_HOST=http://172.17..0.1:8888/v1/rerank #RERANK_BINDING_API_KEY=11
Default value for Cohere AI
RERANK_MODEL=rerank-v3.5
RERANK_BINDING_HOST=https://api.cohere.com/v2/rerank
RERANK_BINDING_API_KEY=your_rerank_api_key_here
Default value for Jina AI
RERANK_MODEL=jina-reranker-v2-base-multilingual
RERANK_BINDING_HOST=https://api.jina.ai/v1/rerank
RERANK_BINDING_API_KEY=your_rerank_api_key_here
Default value for Aliyun
RERANK_MODEL=gte-rerank-v2
RERANK_BINDING_HOST=https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank
RERANK_BINDING_API_KEY=your_rerank_api_key_here
########################################
Document processing configuration
######################################## ENABLE_LLM_CACHE_FOR_EXTRACT=true
Document processing output language: English, Chinese, French, German ...
SUMMARY_LANGUAGE=Russian
Entity types that the LLM will attempt to recognize
ENTITY_TYPES='["Роль", "Процесс", "Правило", "Функция", "Архитектура", "Система", "ДЗО", "Компания", "Сервис", "Подразделение", "Стек", "Программное обеспечение", "Критерии", "Область"]'
Chunk size for document splitting, 500~1500 is recommended
CHUNK_SIZE=800 CHUNK_OVERLAP_SIZE=80
Number of summary segments or tokens to trigger LLM summary on entity/relation merge (at least 3 is recommended)
FORCE_LLM_SUMMARY_ON_MERGE=8
Max description token size to trigger LLM summary
SUMMARY_MAX_TOKENS = 1200
Recommended LLM summary output length in tokens
SUMMARY_LENGTH_RECOMMENDED_=600
Maximum context size sent to LLM for description summary
SUMMARY_CONTEXT_SIZE=12000 ###############################
Concurrency Configuration
###############################
Max concurrency requests of LLM (for both query and document processing)
MAX_ASYNC=1
Number of parallel processing documents(between 2~10, MAX_ASYNC/3 is recommended)
MAX_PARALLEL_INSERT=2
Max concurrency requests for Embedding
EMBEDDING_FUNC_MAX_ASYNC=1
Num of chunks send to Embedding in single request
EMBEDDING_BATCH_NUM=2
###########################################################
LLM Configuration
LLM_BINDING type: openai, ollama, lollms, azure_openai, aws_bedrock
###########################################################
LLM request timeout setting for all llm (0 means no timeout for Ollma)
LLM_TIMEOUT=360
LLM_BINDING=ollama LLM_MODEL=gpt-oss:20b_32k LLM_BINDING_HOST=http://172.17.0.1:27171 LLM_BINDING_API_KEY=1 OLLAMA_LLM_TEMPERATURE=0.0
Optional for Azure
AZURE_OPENAI_API_VERSION=2024-08-01-preview
AZURE_OPENAI_DEPLOYMENT=gpt-4o
Openrouter example
LLM_MODEL=google/gemini-2.5-flash
LLM_BINDING_HOST=https://openrouter.ai/api/v1
LLM_BINDING_API_KEY=your_api_key
LLM_BINDING=openai
OPENAI_LLM_MAX_COMPLETION_TOKENS=9000
OpenAI's new API utilizes max_completion_tokens instead of max_tokens
OPENAI_LLM_MAX_COMPLETION_TOKENS=9000
use the following command to see all support options for OpenAI, azure_openai or OpenRouter
lightrag-server --llm-binding openai --help
OpenAI Specific Parameters
OPENAI_LLM_REASONING_EFFORT=minimal
OpenRouter Specific Parameters
OPENAI_LLM_EXTRA_BODY='{"reasoning": {"enabled": false}}'
Qwen3 Specific Parameters deploy by vLLM
OPENAI_LLM_EXTRA_BODY='{"chat_template_kwargs": {"enable_thinking": false}}'
use the following command to see all support options for Ollama LLM
lightrag-server --llm-binding ollama --help
Ollama Server Specific Parameters
OLLAMA_LLM_NUM_CTX must be provided, and should at least larger than MAX_TOTAL_TOKENS + 2000
OLLAMA_LLM_NUM_CTX=32000
Set the max_output_tokens to mitigate endless output of some LLM (less than LLM_TIMEOUT * llm_output_tokens/second, i.e. 9000 = 180s * 50 tokens/s)
OLLAMA_LLM_NUM_PREDICT=9000
Stop sequences for Ollama LLM
OLLAMA_LLM_STOP='["", "<|EOT|>"]'
Bedrock Specific Parameters
BEDROCK_LLM_TEMPERATURE=1.0
####################################################################################
Embedding Configuration (Should not be changed after the first file processed)
EMBEDDING_BINDING: ollama, openai, azure_openai, jina, lollms, aws_bedrock
#################################################################################### EMBEDDING_TIMEOUT=330 EMBEDDING_BINDING=ollama EMBEDDING_MODEL=qwen3-embedding:8b-q8_0 EMBEDDING_DIM=4096 EMBEDDING_BINDING_API_KEY=2
If the embedding service is deployed within the same Docker stack, use host.docker.internal instead of localhost
EMBEDDING_BINDING_HOST=http://172.17.0.1:27171
Optional for Azure
AZURE_EMBEDDING_DEPLOYMENT=text-embedding-3-large
AZURE_EMBEDDING_API_VERSION=2023-05-15
AZURE_EMBEDDING_ENDPOINT=your_endpoint
AZURE_EMBEDDING_API_KEY=your_api_key
Jina AI Embedding
EMBEDDING_BINDING=jina
EMBEDDING_BINDING_HOST=https://api.jina.ai/v1/embeddings
EMBEDDING_MODEL=jina-embeddings-v4
EMBEDDING_DIM=2048
EMBEDDING_BINDING_API_KEY=your_api_key
Optional for Ollama embedding
OLLAMA_EMBEDDING_NUM_CTX=8192
use the following command to see all support options for Ollama embedding
lightrag-server --embedding-binding ollama --help
####################################################################
WORKSPACE sets workspace name for all storage types
for the purpose of isolating data from LightRAG instances.
Valid workspace name constraints: a-z, A-Z, 0-9, and _
#################################################################### WORKSPACE=arch
############################
Data storage selection
############################
Default storage (Recommended for small scale deployment)
LIGHTRAG_KV_STORAGE=JsonKVStorage
LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage
LIGHTRAG_GRAPH_STORAGE=NetworkXStorage
LIGHTRAG_VECTOR_STORAGE=NanoVectorDBStorage
Redis Storage (Recommended for production deployment)
LIGHTRAG_KV_STORAGE=RedisKVStorage
LIGHTRAG_DOC_STATUS_STORAGE=RedisDocStatusStorage
Vector Storage (Recommended for production deployment)
LIGHTRAG_VECTOR_STORAGE=MilvusVectorDBStorage
LIGHTRAG_VECTOR_STORAGE=QdrantVectorDBStorage
LIGHTRAG_VECTOR_STORAGE=FaissVectorDBStorage
Logs and screenshots
Processing d-id: doc-f7e3760cac75f182288387f0a351c618 Failed to extract entities and relationships: C[1/3]: chunk-a8cd18b704f1a7be85b1af0cbf83f82d: Traceback (most recent call last): File "/app/.venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions yield File "/app/.venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 394, in handle_async_request resp = await self._pool.handle_async_request(req) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request raise exc from None File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request response = await connection.handle_async_request( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request return await self._connection.handle_async_request(request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 136, in handle_async_request raise exc File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 106, in handle_async_request ) = await self._receive_response_headers(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 177, in _receive_response_headers event = await self._receive_event(timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 217, in _receive_event data = await self._network_stream.read( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 32, in read with map_exceptions(exc_map): ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/contextlib.py", line 158, in exit self.gen.throw(value) File "/app/.venv/lib/python3.12/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions raise to_exc(exc) from exc httpcore.ReadTimeout
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/lightrag/operate.py", line 2706, in _process_with_semaphore
return await _process_single_content(chunk)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lightrag/operate.py", line 2599, in _process_single_content
final_result, timestamp = await use_llm_func_with_cache(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lightrag/utils.py", line 1698, in use_llm_func_with_cache
res: str = await use_llm_func(
^^^^^^^^^^^^^^^^^^^
File "/app/lightrag/utils.py", line 847, in wait_func
return await future
^^^^^^^^^^^^
File "/app/lightrag/utils.py", line 551, in worker
result = await asyncio.wait_for(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
return await fut
^^^^^^^^^
File "/app/lightrag/llm/ollama.py", line 135, in ollama_model_complete
return await _ollama_model_if_cache(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/init.py", line 189, in async_wrapped
return await copy(fn, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/init.py", line 111, in call
do = await self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/init.py", line 153, in iter
result = await action(retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/_utils.py", line 99, in inner
return call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/init.py", line 400, in
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/app/lightrag/operate.py", line 2710, in _process_with_semaphore raise prefixed_exception from e httpx.ReadTimeout: chunk-a8cd18b704f1a7be85b1af0cbf83f82d:
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/app/lightrag/lightrag.py", line 1804, in process_document await entity_relation_task File "/app/lightrag/lightrag.py", line 2031, in _process_extract_entities raise e File "/app/lightrag/lightrag.py", line 2016, in _process_extract_entities chunk_results = await extract_entities( ^^^^^^^^^^^^^^^^^^^^^^^ File "/app/lightrag/operate.py", line 2752, in extract_entities raise prefixed_exception from first_exception httpx.ReadTimeout: C[1/3]: chunk-a8cd18b704f1a7be85b1af0cbf83f82d:
Additional Information
- LightRAG Version: 1.4.9.4
- Operating System: Ubuntu 24.04.3 LTS
- Python Version: 3.12
- Related Issues:
tried to declutter and make it readable:
[Bug]: Embedding not happening - constant errors
Issue #2257 - Opened by @ndrewpj
Description
Do you need to file an issue?
- [x] I have searched the existing issues and this bug is not already filed.
- [x] I believe this is a legitimate bug, not just a question or feature request.
Describe the bug
Latest v.1.4.9.4 Using Ollama with Qwen3-embedding-8b q8-0
Cannot get any embedding on MS Word and pptx files.
Steps to reproduce
No response
Expected Behavior
No response
LightRAG Config Used
###########################
# Server Configuration
###########################
HOST=0.0.0.0
PORT=9621
WEBUI_TITLE='Arch Graph KB'
WEBUI_DESCRIPTION="Simple and Fast Graph Based RAG System"
WORKERS=2
# gunicorn worker timeout(as default LLM request timeout if LLM_TIMEOUT is not set)
TIMEOUT=350
CORS_ORIGINS=http://localhost:3000,http://localhost:8080
# Directory Configuration (defaults to current working directory)
INPUT_DIR=<absolute_path_for_doc_input_dir>
WORKING_DIR=<absolute_path_for_working_dir>
# Tiktoken cache directory
TIKTOKEN_CACHE_DIR=./temp/tiktoken
# Logging level
LOG_LEVEL=INFO
VERBOSE=False
LOG_MAX_BYTES=10485760
LOG_BACKUP_COUNT=5
LOG_DIR=/path/to/log/directory
########################################
# Document processing configuration
########################################
ENABLE_LLM_CACHE_FOR_EXTRACT=true
# Document processing output language
SUMMARY_LANGUAGE=Russian
# Entity types that the LLM will attempt to recognize
ENTITY_TYPES='["Роль", "Процесс", "Правило", "Функция", "Архитектура", "Система", "ДЗО", "Компания", "Сервис", "Подразделение", "Стек", "Программное обеспечение", "Критерии", "Область"]'
# Chunk size for document splitting, 500~1500 is recommended
CHUNK_SIZE=800
CHUNK_OVERLAP_SIZE=80
# Number of summary segments or tokens to trigger LLM summary on entity/relation merge
FORCE_LLM_SUMMARY_ON_MERGE=8
# Max description token size to trigger LLM summary
SUMMARY_MAX_TOKENS=1200
# Recommended LLM summary output length in tokens
SUMMARY_LENGTH_RECOMMENDED_=600
# Maximum context size sent to LLM for description summary
SUMMARY_CONTEXT_SIZE=12000
###############################
# Concurrency Configuration
###############################
# Max concurrency requests of LLM (for both query and document processing)
MAX_ASYNC=1
# Number of parallel processing documents
MAX_PARALLEL_INSERT=2
# Max concurrency requests for Embedding
EMBEDDING_FUNC_MAX_ASYNC=1
# Num of chunks send to Embedding in single request
EMBEDDING_BATCH_NUM=2
###########################################################
# LLM Configuration
###########################################################
# LLM request timeout setting for all llm
LLM_TIMEOUT=360
LLM_BINDING=ollama
LLM_MODEL=gpt-oss:20b_32k
LLM_BINDING_HOST=http://172.17.0.1:27171
LLM_BINDING_API_KEY=1
OLLAMA_LLM_TEMPERATURE=0.0
####################################################################################
# Embedding Configuration
####################################################################################
EMBEDDING_TIMEOUT=330
EMBEDDING_BINDING=ollama
EMBEDDING_MODEL=qwen3-embedding:8b-q8_0
EMBEDDING_DIM=4096
EMBEDDING_BINDING_API_KEY=2
EMBEDDING_BINDING_HOST=http://172.17.0.1:27171
# Optional for Ollama embedding
OLLAMA_EMBEDDING_NUM_CTX=8192
####################################################################
# WORKSPACE Configuration
####################################################################
WORKSPACE=arch
############################
# Data storage selection
############################
# Default storage (Recommended for small scale deployment)
LIGHTRAG_KV_STORAGE=JsonKVStorage
LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage
LIGHTRAG_GRAPH_STORAGE=NetworkXStorage
LIGHTRAG_VECTOR_STORAGE=NanoVectorDBStorage
Logs and screenshots
Error Log
Processing d-id: doc-f7e3760cac75f182288387f0a351c618
Failed to extract entities and relationships: C[1/3]: chunk-a8cd18b704f1a7be85b1af0cbf83f82d:
Traceback (most recent call last):
File "/app/.venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
yield
File "/app/.venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 394, in handle_async_request
resp = await self._pool.handle_async_request(req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request
raise exc from None
File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request
response = await connection.handle_async_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request
return await self._connection.handle_async_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 136, in handle_async_request
raise exc
File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 106, in handle_async_request
) = await self._receive_response_headers(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 177, in _receive_response_headers
event = await self._receive_event(timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 217, in _receive_event
data = await self._network_stream.read(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 32, in read
with map_exceptions(exc_map):
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/contextlib.py", line 158, in exit
self.gen.throw(value)
File "/app/.venv/lib/python3.12/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
raise to_exc(exc) from exc
httpcore.ReadTimeout
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/lightrag/operate.py", line 2706, in _process_with_semaphore
return await _process_single_content(chunk)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lightrag/operate.py", line 2599, in _process_single_content
final_result, timestamp = await use_llm_func_with_cache(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lightrag/utils.py", line 1698, in use_llm_func_with_cache
res: str = await use_llm_func(
^^^^^^^^^^^^^^^^^^^
File "/app/lightrag/utils.py", line 847, in wait_func
return await future
^^^^^^^^^^^^
File "/app/lightrag/utils.py", line 551, in worker
result = await asyncio.wait_for(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
return await fut
^^^^^^^^^
File "/app/lightrag/llm/ollama.py", line 135, in ollama_model_complete
return await _ollama_model_if_cache(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 189, in async_wrapped
return await copy(fn, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 111, in __call__
do = await self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
result = await action(retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/_utils.py", line 99, in inner
return call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 400, in <lambda>
self._add_action_func(lambda rs: rs.outcome.result())
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 114, in __call__
result = await fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lightrag/llm/ollama.py", line 109, in _ollama_model_if_cache
raise e
File "/app/lightrag/llm/ollama.py", line 72, in _ollama_model_if_cache
response = await ollama_client.chat(model=model, messages=messages, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/ollama/_client.py", line 953, in chat
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/ollama/_client.py", line 751, in _request
return cls((await self._request_raw(*args, **kwargs)).json())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/ollama/_client.py", line 691, in _request_raw
r = await self._client.request(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/httpx/_client.py", line 1540, in request
return await self.send(request, auth=auth, follow_redirects=follow_redirects)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/httpx/_client.py", line 1629, in send
response = await self._send_handling_auth(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/httpx/_client.py", line 1657, in _send_handling_auth
response = await self._send_handling_redirects(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/httpx/_client.py", line 1694, in _send_handling_redirects
response = await self._send_single_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/httpx/_client.py", line 1730, in _send_single_request
response = await transport.handle_async_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 393, in handle_async_request
with map_httpcore_exceptions():
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/contextlib.py", line 158, in exit
self.gen.throw(value)
File "/app/.venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 118, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.ReadTimeout
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/lightrag/operate.py", line 2710, in _process_with_semaphore
raise prefixed_exception from e
httpx.ReadTimeout: chunk-a8cd18b704f1a7be85b1af0cbf83f82d:
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/lightrag/lightrag.py", line 1804, in process_document
await entity_relation_task
File "/app/lightrag/lightrag.py", line 2031, in _process_extract_entities
raise e
File "/app/lightrag/lightrag.py", line 2016, in _process_extract_entities
chunk_results = await extract_entities(
^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lightrag/operate.py", line 2752, in extract_entities
raise prefixed_exception from first_exception
httpx.ReadTimeout: C[1/3]: chunk-a8cd18b704f1a7be85b1af0cbf83f82d:
Additional Information
- LightRAG Version: 1.4.9.4
- Operating System: Ubuntu 24.04.3 LTS
- Python Version: 3.12
Related Issues
No related issues mentioned
The error indicates that the LLM invocation timed out. This is likely due to insufficient local computational resources. LightRAG recommends a minimum context length of 32KB for the LLM. By default, LightRAG is configured with a concurrency limit of 6 for LLM requests. However, Ollama’s default concurrency is set to 1, which can easily lead to timeouts under concurrent workloads.
Please ensure your local hardware resources are compatible with LightRAG’s configuration. Key settings to verify include:
###############################
### Concurrency Configuration
###############################
### Max concurrency requests of LLM (for both query and document processing)
MAX_ASYNC=6
### Number of parallel processing documents(between 2~10, MAX_ASYNC/3 is recommended)
MAX_PARALLEL_INSERT=3
### Max concurrency requests for Embedding
EMBEDDING_FUNC_MAX_ASYNC=8
### Num of chunks send to Embedding in single request
EMBEDDING_BATCH_NUM=16
### OLLAMA_LLM_NUM_CTX must be larger than MAX_TOTAL_TOKENS + 2000
OLLAMA_LLM_NUM_CTX=8192
########################################
### Document processing configuration
########################################
### Number of summary semgments or tokens to trigger LLM summary on entity/relation merge (at least 3 is recommented)
# FORCE_LLM_SUMMARY_ON_MERGE=8
### Max description token size to trigger LLM summary
# SUMMARY_MAX_TOKENS = 1200
### Recommended LLM summary output length in tokens
# SUMMARY_LENGTH_RECOMMENDED=600
### Maximum context size sent to LLM for description summary
# SUMMARY_CONTEXT_SIZE=12000
########################
### Query Configuration
########################
# LLM responde cache for query (Not valid for streaming response
ENABLE_LLM_CACHE=true
# COSINE_THRESHOLD=0.2
### Number of entities or relations retrieved from KG
# TOP_K=40
### Maxmium number or chunks plan to send to LLM
# CHUNK_TOP_K=20
### control the actual enties send to LLM
# MAX_ENTITY_TOKENS=6000
### control the actual relations send to LLM
# MAX_RELATION_TOKENS=8000
### control the maximum tokens send to LLM (include entities, raltions and chunks)
# MAX_TOTAL_TOKENS=30000
### xaximum number of related chunks per source entity or relation (higher values increase re-ranking time)
# RELATED_CHUNK_NUMBER=5
When GPU memory is insufficient, the Ollama server will repeatedly unload and reload both the embedding and LLM models, severely degrading performance. We therefore recommend using vLLM or sglang for model deployment to achieve better efficiency and stability.