[Bug]:Embedding not happening - constant errors

Open ndrewpj opened this issue 2 months ago • 3 comments

Do you need to file an issue?

[x] I have searched the existing issues and this bug is not already filed.
[x] I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

Latest v.1.4.9.4 Using Ollama with Qwen3-embedding-8b q8-0

Cannot get any embedding on MS Word and pptx files:

Steps to reproduce

No response

Expected Behavior

No response

LightRAG Config Used

Paste your config here

This is sample file of .env

###########################

Server Configuration

########################### HOST=0.0.0.0 PORT=9621 WEBUI_TITLE='Arch Graph KB' WEBUI_DESCRIPTION="Simple and Fast Graph Based RAG System" WORKERS=2

gunicorn worker timeout(as default LLM request timeout if LLM_TIMEOUT is not set)

TIMEOUT=350

CORS_ORIGINS=http://localhost:3000,http://localhost:8080

Optional SSL Configuration

SSL=true

SSL_CERTFILE=/path/to/cert.pem

SSL_KEYFILE=/path/to/key.pem

Directory Configuration (defaults to current working directory)

Default value is ./inputs and ./rag_storage

INPUT_DIR=<absolute_path_for_doc_input_dir>

WORKING_DIR=<absolute_path_for_working_dir>

Tiktoken cache directory (Store cached files in this folder for offline deployment)

TIKTOKEN_CACHE_DIR=./temp/tiktoken

Ollama Emulating Model and Tag

OLLAMA_EMULATING_MODEL_NAME=lightrag

OLLAMA_EMULATING_MODEL_TAG=latest

Max nodes return from graph retrieval in webui

MAX_GRAPH_NODES=1000

Logging level

LOG_LEVEL=INFO

VERBOSE=False

LOG_MAX_BYTES=10485760

LOG_BACKUP_COUNT=5

Logfile location (defaults to current working directory)

LOG_DIR=/path/to/log/directory

How to control the context length sent to LLM:

MAX_ENTITY_TOKENS + MAX_RELATION_TOKENS < MAX_TOTAL_TOKENS

Chunk_Tokens = MAX_TOTAL_TOKENS - Actual_Entity_Tokens - Actual_Relation_Tokens

######################################################################################

LLM response cache for query (Not valid for streaming response)

ENABLE_LLM_CACHE=true COSINE_THRESHOLD=0.2

Number of entities or relations retrieved from KG

TOP_K=40

Maximum number or chunks for naive vector search

CHUNK_TOP_K=20

control the actual entities send to LLM

MAX_ENTITY_TOKENS=6000

control the actual relations send to LLM

MAX_RELATION_TOKENS=8000

control the maximum tokens send to LLM (include entities, relations and chunks)

MAX_TOTAL_TOKENS=30000

maximum number of related chunks per source entity or relation

The chunk picker uses this value to determine the total number of chunks selected from KG(knowledge graph)

Higher values increase re-ranking time

#RELATED_CHUNK_NUMBER=5

chunk selection strategies

VECTOR: Pick KG chunks by vector similarity, delivered chunks to the LLM aligning more closely with naive retrieval

WEIGHT: Pick KG chunks by entity and chunk weight, delivered more solely KG related chunks to the LLM

If reranking is enabled, the impact of chunk selection strategies will be diminished.

#KG_CHUNK_PICK_METHOD=WEIGHT

#########################################################

Reranking configuration

RERANK_BINDING type: null, cohere, jina, aliyun

For rerank model deployed by vLLM use cohere binding

######################################################### RERANK_BINDING=null

Enable rerank by default in query params when RERANK_BINDING is not null

RERANK_BY_DEFAULT=True

rerank score chunk filter(set to 0.0 to keep all chunks, 0.6 or above if LLM is not strong enough)

MIN_RERANK_SCORE=0.0

For local deployment with vLLM

#RERANK_MODEL=gte-multilingual-reranker-base #RERANK_BINDING_HOST=http://172.17..0.1:8888/v1/rerank #RERANK_BINDING_API_KEY=11

Default value for Cohere AI

RERANK_MODEL=rerank-v3.5

RERANK_BINDING_HOST=https://api.cohere.com/v2/rerank

RERANK_BINDING_API_KEY=your_rerank_api_key_here

Default value for Jina AI

RERANK_MODEL=jina-reranker-v2-base-multilingual

RERANK_BINDING_HOST=https://api.jina.ai/v1/rerank

RERANK_BINDING_API_KEY=your_rerank_api_key_here

Default value for Aliyun

RERANK_MODEL=gte-rerank-v2

RERANK_BINDING_HOST=https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank

RERANK_BINDING_API_KEY=your_rerank_api_key_here

########################################

Document processing configuration

######################################## ENABLE_LLM_CACHE_FOR_EXTRACT=true

Document processing output language: English, Chinese, French, German ...

SUMMARY_LANGUAGE=Russian

Entity types that the LLM will attempt to recognize

ENTITY_TYPES='["Роль", "Процесс", "Правило", "Функция", "Архитектура", "Система", "ДЗО", "Компания", "Сервис", "Подразделение", "Стек", "Программное обеспечение", "Критерии", "Область"]'

Chunk size for document splitting, 500~1500 is recommended

CHUNK_SIZE=800 CHUNK_OVERLAP_SIZE=80

Number of summary segments or tokens to trigger LLM summary on entity/relation merge (at least 3 is recommended)

FORCE_LLM_SUMMARY_ON_MERGE=8

Max description token size to trigger LLM summary

SUMMARY_MAX_TOKENS = 1200

Recommended LLM summary output length in tokens

SUMMARY_LENGTH_RECOMMENDED_=600

Maximum context size sent to LLM for description summary

SUMMARY_CONTEXT_SIZE=12000 ###############################

Concurrency Configuration

###############################

Max concurrency requests of LLM (for both query and document processing)

MAX_ASYNC=1

Number of parallel processing documents(between 2~10, MAX_ASYNC/3 is recommended)

MAX_PARALLEL_INSERT=2

Max concurrency requests for Embedding

EMBEDDING_FUNC_MAX_ASYNC=1

Num of chunks send to Embedding in single request

EMBEDDING_BATCH_NUM=2

###########################################################

LLM Configuration

LLM_BINDING type: openai, ollama, lollms, azure_openai, aws_bedrock

###########################################################

LLM request timeout setting for all llm (0 means no timeout for Ollma)

LLM_TIMEOUT=360

LLM_BINDING=ollama LLM_MODEL=gpt-oss:20b_32k LLM_BINDING_HOST=http://172.17.0.1:27171 LLM_BINDING_API_KEY=1 OLLAMA_LLM_TEMPERATURE=0.0

Optional for Azure

AZURE_OPENAI_API_VERSION=2024-08-01-preview

AZURE_OPENAI_DEPLOYMENT=gpt-4o

Openrouter example

LLM_MODEL=google/gemini-2.5-flash

LLM_BINDING_HOST=https://openrouter.ai/api/v1

LLM_BINDING_API_KEY=your_api_key

LLM_BINDING=openai

OPENAI_LLM_MAX_COMPLETION_TOKENS=9000

OpenAI's new API utilizes max_completion_tokens instead of max_tokens

OPENAI_LLM_MAX_COMPLETION_TOKENS=9000

use the following command to see all support options for OpenAI, azure_openai or OpenRouter

lightrag-server --llm-binding openai --help

OpenAI Specific Parameters

OPENAI_LLM_REASONING_EFFORT=minimal

OpenRouter Specific Parameters

OPENAI_LLM_EXTRA_BODY='{"reasoning": {"enabled": false}}'

Qwen3 Specific Parameters deploy by vLLM

OPENAI_LLM_EXTRA_BODY='{"chat_template_kwargs": {"enable_thinking": false}}'

use the following command to see all support options for Ollama LLM

lightrag-server --llm-binding ollama --help

Ollama Server Specific Parameters

OLLAMA_LLM_NUM_CTX must be provided, and should at least larger than MAX_TOTAL_TOKENS + 2000

OLLAMA_LLM_NUM_CTX=32000

Set the max_output_tokens to mitigate endless output of some LLM (less than LLM_TIMEOUT * llm_output_tokens/second, i.e. 9000 = 180s * 50 tokens/s)

OLLAMA_LLM_NUM_PREDICT=9000

Stop sequences for Ollama LLM

OLLAMA_LLM_STOP='["", "<|EOT|>"]'

Bedrock Specific Parameters

BEDROCK_LLM_TEMPERATURE=1.0

####################################################################################

Embedding Configuration (Should not be changed after the first file processed)

EMBEDDING_BINDING: ollama, openai, azure_openai, jina, lollms, aws_bedrock

#################################################################################### EMBEDDING_TIMEOUT=330 EMBEDDING_BINDING=ollama EMBEDDING_MODEL=qwen3-embedding:8b-q8_0 EMBEDDING_DIM=4096 EMBEDDING_BINDING_API_KEY=2

If the embedding service is deployed within the same Docker stack, use host.docker.internal instead of localhost

EMBEDDING_BINDING_HOST=http://172.17.0.1:27171

Optional for Azure

AZURE_EMBEDDING_DEPLOYMENT=text-embedding-3-large

AZURE_EMBEDDING_API_VERSION=2023-05-15

AZURE_EMBEDDING_ENDPOINT=your_endpoint

AZURE_EMBEDDING_API_KEY=your_api_key

Jina AI Embedding

EMBEDDING_BINDING=jina

EMBEDDING_BINDING_HOST=https://api.jina.ai/v1/embeddings

EMBEDDING_MODEL=jina-embeddings-v4

EMBEDDING_DIM=2048

EMBEDDING_BINDING_API_KEY=your_api_key

Optional for Ollama embedding

OLLAMA_EMBEDDING_NUM_CTX=8192

use the following command to see all support options for Ollama embedding

lightrag-server --embedding-binding ollama --help

####################################################################

WORKSPACE sets workspace name for all storage types

for the purpose of isolating data from LightRAG instances.

Valid workspace name constraints: a-z, A-Z, 0-9, and _

#################################################################### WORKSPACE=arch

############################

Data storage selection

############################

Default storage (Recommended for small scale deployment)

LIGHTRAG_KV_STORAGE=JsonKVStorage

LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage

LIGHTRAG_GRAPH_STORAGE=NetworkXStorage

LIGHTRAG_VECTOR_STORAGE=NanoVectorDBStorage

Redis Storage (Recommended for production deployment)

LIGHTRAG_KV_STORAGE=RedisKVStorage

LIGHTRAG_DOC_STATUS_STORAGE=RedisDocStatusStorage

Vector Storage (Recommended for production deployment)

LIGHTRAG_VECTOR_STORAGE=MilvusVectorDBStorage

LIGHTRAG_VECTOR_STORAGE=QdrantVectorDBStorage

LIGHTRAG_VECTOR_STORAGE=FaissVectorDBStorage

Logs and screenshots

Processing d-id: doc-f7e3760cac75f182288387f0a351c618 Failed to extract entities and relationships: C[1/3]: chunk-a8cd18b704f1a7be85b1af0cbf83f82d: Traceback (most recent call last): File "/app/.venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions yield File "/app/.venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 394, in handle_async_request resp = await self._pool.handle_async_request(req) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request raise exc from None File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request response = await connection.handle_async_request( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request return await self._connection.handle_async_request(request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 136, in handle_async_request raise exc File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 106, in handle_async_request ) = await self._receive_response_headers(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 177, in _receive_response_headers event = await self._receive_event(timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 217, in _receive_event data = await self._network_stream.read( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 32, in read with map_exceptions(exc_map): ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/contextlib.py", line 158, in exit self.gen.throw(value) File "/app/.venv/lib/python3.12/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions raise to_exc(exc) from exc httpcore.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/app/lightrag/operate.py", line 2706, in _process_with_semaphore return await _process_single_content(chunk) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/lightrag/operate.py", line 2599, in _process_single_content final_result, timestamp = await use_llm_func_with_cache( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/lightrag/utils.py", line 1698, in use_llm_func_with_cache res: str = await use_llm_func( ^^^^^^^^^^^^^^^^^^^ File "/app/lightrag/utils.py", line 847, in wait_func return await future ^^^^^^^^^^^^ File "/app/lightrag/utils.py", line 551, in worker result = await asyncio.wait_for( ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/asyncio/tasks.py", line 520, in wait_for return await fut ^^^^^^^^^ File "/app/lightrag/llm/ollama.py", line 135, in ollama_model_complete return await _ollama_model_if_cache( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/init.py", line 189, in async_wrapped return await copy(fn, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/init.py", line 111, in call do = await self.iter(retry_state=retry_state) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/init.py", line 153, in iter result = await action(retry_state) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/tenacity/_utils.py", line 99, in inner return call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/tenacity/init.py", line 400, in self._add_action_func(lambda rs: rs.outcome.result()) ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result raise self._exception File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/init.py", line 114, in call result = await fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/lightrag/llm/ollama.py", line 109, in _ollama_model_if_cache raise e File "/app/lightrag/llm/ollama.py", line 72, in _ollama_model_if_cache response = await ollama_client.chat(model=model, messages=messages, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/ollama/_client.py", line 953, in chat return await self._request( ^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/ollama/_client.py", line 751, in _request return cls((await self._request_raw(*args, **kwargs)).json()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/ollama/_client.py", line 691, in _request_raw r = await self._client.request(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpx/_client.py", line 1540, in request return await self.send(request, auth=auth, follow_redirects=follow_redirects) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpx/_client.py", line 1629, in send response = await self._send_handling_auth( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpx/_client.py", line 1657, in _send_handling_auth response = await self._send_handling_redirects( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpx/_client.py", line 1694, in _send_handling_redirects response = await self._send_single_request(request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpx/_client.py", line 1730, in _send_single_request response = await transport.handle_async_request(request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/.venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 393, in handle_async_request with map_httpcore_exceptions(): ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/contextlib.py", line 158, in exit self.gen.throw(value) File "/app/.venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 118, in map_httpcore_exceptions raise mapped_exc(message) from exc httpx.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/app/lightrag/operate.py", line 2710, in _process_with_semaphore raise prefixed_exception from e httpx.ReadTimeout: chunk-a8cd18b704f1a7be85b1af0cbf83f82d:

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/app/lightrag/lightrag.py", line 1804, in process_document await entity_relation_task File "/app/lightrag/lightrag.py", line 2031, in _process_extract_entities raise e File "/app/lightrag/lightrag.py", line 2016, in _process_extract_entities chunk_results = await extract_entities( ^^^^^^^^^^^^^^^^^^^^^^^ File "/app/lightrag/operate.py", line 2752, in extract_entities raise prefixed_exception from first_exception httpx.ReadTimeout: C[1/3]: chunk-a8cd18b704f1a7be85b1af0cbf83f82d:

Additional Information

LightRAG Version: 1.4.9.4
Operating System: Ubuntu 24.04.3 LTS
Python Version: 3.12
Related Issues:

Oct 24 '25 07:10 ndrewpj

tried to declutter and make it readable:

[Bug]: Embedding not happening - constant errors

Issue #2257 - Opened by @ndrewpj

Description

Do you need to file an issue?

[x] I have searched the existing issues and this bug is not already filed.
[x] I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

Latest v.1.4.9.4 Using Ollama with Qwen3-embedding-8b q8-0

Cannot get any embedding on MS Word and pptx files.

Steps to reproduce

No response

Expected Behavior

No response

LightRAG Config Used

###########################
# Server Configuration
###########################
HOST=0.0.0.0
PORT=9621
WEBUI_TITLE='Arch Graph KB'
WEBUI_DESCRIPTION="Simple and Fast Graph Based RAG System"
WORKERS=2

# gunicorn worker timeout(as default LLM request timeout if LLM_TIMEOUT is not set)
TIMEOUT=350

CORS_ORIGINS=http://localhost:3000,http://localhost:8080

# Directory Configuration (defaults to current working directory)
INPUT_DIR=<absolute_path_for_doc_input_dir>
WORKING_DIR=<absolute_path_for_working_dir>

# Tiktoken cache directory
TIKTOKEN_CACHE_DIR=./temp/tiktoken

# Logging level
LOG_LEVEL=INFO
VERBOSE=False
LOG_MAX_BYTES=10485760
LOG_BACKUP_COUNT=5
LOG_DIR=/path/to/log/directory

########################################
# Document processing configuration
########################################
ENABLE_LLM_CACHE_FOR_EXTRACT=true

# Document processing output language
SUMMARY_LANGUAGE=Russian

# Entity types that the LLM will attempt to recognize
ENTITY_TYPES='["Роль", "Процесс", "Правило", "Функция", "Архитектура", "Система", "ДЗО", "Компания", "Сервис", "Подразделение", "Стек", "Программное обеспечение", "Критерии", "Область"]'

# Chunk size for document splitting, 500~1500 is recommended
CHUNK_SIZE=800
CHUNK_OVERLAP_SIZE=80

# Number of summary segments or tokens to trigger LLM summary on entity/relation merge
FORCE_LLM_SUMMARY_ON_MERGE=8

# Max description token size to trigger LLM summary
SUMMARY_MAX_TOKENS=1200

# Recommended LLM summary output length in tokens
SUMMARY_LENGTH_RECOMMENDED_=600

# Maximum context size sent to LLM for description summary
SUMMARY_CONTEXT_SIZE=12000

###############################
# Concurrency Configuration
###############################

# Max concurrency requests of LLM (for both query and document processing)
MAX_ASYNC=1

# Number of parallel processing documents
MAX_PARALLEL_INSERT=2

# Max concurrency requests for Embedding
EMBEDDING_FUNC_MAX_ASYNC=1

# Num of chunks send to Embedding in single request
EMBEDDING_BATCH_NUM=2

###########################################################
# LLM Configuration
###########################################################

# LLM request timeout setting for all llm
LLM_TIMEOUT=360

LLM_BINDING=ollama
LLM_MODEL=gpt-oss:20b_32k
LLM_BINDING_HOST=http://172.17.0.1:27171
LLM_BINDING_API_KEY=1
OLLAMA_LLM_TEMPERATURE=0.0

####################################################################################
# Embedding Configuration
####################################################################################
EMBEDDING_TIMEOUT=330
EMBEDDING_BINDING=ollama
EMBEDDING_MODEL=qwen3-embedding:8b-q8_0
EMBEDDING_DIM=4096
EMBEDDING_BINDING_API_KEY=2
EMBEDDING_BINDING_HOST=http://172.17.0.1:27171

# Optional for Ollama embedding
OLLAMA_EMBEDDING_NUM_CTX=8192

####################################################################
# WORKSPACE Configuration
####################################################################
WORKSPACE=arch

############################
# Data storage selection
############################

# Default storage (Recommended for small scale deployment)
LIGHTRAG_KV_STORAGE=JsonKVStorage
LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage
LIGHTRAG_GRAPH_STORAGE=NetworkXStorage
LIGHTRAG_VECTOR_STORAGE=NanoVectorDBStorage

Logs and screenshots

Error Log

Processing d-id: doc-f7e3760cac75f182288387f0a351c618
Failed to extract entities and relationships: C[1/3]: chunk-a8cd18b704f1a7be85b1af0cbf83f82d:
Traceback (most recent call last):
  File "/app/.venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
    yield
  File "/app/.venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 394, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request
    raise exc from None
  File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request
    response = await connection.handle_async_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request
    return await self._connection.handle_async_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 136, in handle_async_request
    raise exc
  File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 106, in handle_async_request
    ) = await self._receive_response_headers(**kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 177, in _receive_response_headers
    event = await self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 217, in _receive_event
    data = await self._network_stream.read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 32, in read
    with map_exceptions(exc_map):
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in exit
    self.gen.throw(value)
  File "/app/.venv/lib/python3.12/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/lightrag/operate.py", line 2706, in _process_with_semaphore
    return await _process_single_content(chunk)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lightrag/operate.py", line 2599, in _process_single_content
    final_result, timestamp = await use_llm_func_with_cache(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lightrag/utils.py", line 1698, in use_llm_func_with_cache
    res: str = await use_llm_func(
               ^^^^^^^^^^^^^^^^^^^
  File "/app/lightrag/utils.py", line 847, in wait_func
    return await future
           ^^^^^^^^^^^^
  File "/app/lightrag/utils.py", line 551, in worker
    result = await asyncio.wait_for(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
    return await fut
           ^^^^^^^^^
  File "/app/lightrag/llm/ollama.py", line 135, in ollama_model_complete
    return await _ollama_model_if_cache(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 189, in async_wrapped
    return await copy(fn, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 111, in __call__
    do = await self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
    result = await action(retry_state)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/tenacity/_utils.py", line 99, in inner
    return call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 400, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
                                      ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 114, in __call__
    result = await fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lightrag/llm/ollama.py", line 109, in _ollama_model_if_cache
    raise e
  File "/app/lightrag/llm/ollama.py", line 72, in _ollama_model_if_cache
    response = await ollama_client.chat(model=model, messages=messages, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/ollama/_client.py", line 953, in chat
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/ollama/_client.py", line 751, in _request
    return cls((await self._request_raw(*args, **kwargs)).json())
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/ollama/_client.py", line 691, in _request_raw
    r = await self._client.request(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/httpx/_client.py", line 1540, in request
    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/httpx/_client.py", line 1629, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/httpx/_client.py", line 1657, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/httpx/_client.py", line 1694, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/httpx/_client.py", line 1730, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 393, in handle_async_request
    with map_httpcore_exceptions():
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in exit
    self.gen.throw(value)
  File "/app/.venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 118, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/lightrag/operate.py", line 2710, in _process_with_semaphore
    raise prefixed_exception from e
httpx.ReadTimeout: chunk-a8cd18b704f1a7be85b1af0cbf83f82d:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/lightrag/lightrag.py", line 1804, in process_document
    await entity_relation_task
  File "/app/lightrag/lightrag.py", line 2031, in _process_extract_entities
    raise e
  File "/app/lightrag/lightrag.py", line 2016, in _process_extract_entities
    chunk_results = await extract_entities(
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lightrag/operate.py", line 2752, in extract_entities
    raise prefixed_exception from first_exception
httpx.ReadTimeout: C[1/3]: chunk-a8cd18b704f1a7be85b1af0cbf83f82d:

Additional Information

LightRAG Version: 1.4.9.4
Operating System: Ubuntu 24.04.3 LTS
Python Version: 3.12

Related Issues

No related issues mentioned

Oct 24 '25 10:10 superuely

The error indicates that the LLM invocation timed out. This is likely due to insufficient local computational resources. LightRAG recommends a minimum context length of 32KB for the LLM. By default, LightRAG is configured with a concurrency limit of 6 for LLM requests. However, Ollama’s default concurrency is set to 1, which can easily lead to timeouts under concurrent workloads.

Please ensure your local hardware resources are compatible with LightRAG’s configuration. Key settings to verify include:

###############################
### Concurrency Configuration
###############################
### Max concurrency requests of LLM (for both query and document processing)
MAX_ASYNC=6
### Number of parallel processing documents(between 2~10, MAX_ASYNC/3 is recommended)
MAX_PARALLEL_INSERT=3
### Max concurrency requests for Embedding
EMBEDDING_FUNC_MAX_ASYNC=8
### Num of chunks send to Embedding in single request
EMBEDDING_BATCH_NUM=16

### OLLAMA_LLM_NUM_CTX must be larger than MAX_TOTAL_TOKENS + 2000
OLLAMA_LLM_NUM_CTX=8192

########################################
### Document processing configuration
########################################
### Number of summary semgments or tokens to trigger LLM summary on entity/relation merge (at least 3 is recommented)
# FORCE_LLM_SUMMARY_ON_MERGE=8
### Max description token size to trigger LLM summary
# SUMMARY_MAX_TOKENS = 1200
### Recommended LLM summary output length in tokens
# SUMMARY_LENGTH_RECOMMENDED=600
### Maximum context size sent to LLM for description summary
# SUMMARY_CONTEXT_SIZE=12000

########################
### Query Configuration
########################
# LLM responde cache for query (Not valid for streaming response
ENABLE_LLM_CACHE=true
# COSINE_THRESHOLD=0.2
### Number of entities or relations retrieved from KG
# TOP_K=40
### Maxmium number or chunks plan to send to LLM
# CHUNK_TOP_K=20
### control the actual enties send to LLM
# MAX_ENTITY_TOKENS=6000
### control the actual relations send to LLM
# MAX_RELATION_TOKENS=8000
### control the maximum tokens send to LLM (include entities, raltions and chunks)
# MAX_TOTAL_TOKENS=30000
### xaximum number of related chunks per source entity or relation (higher values increase re-ranking time)
# RELATED_CHUNK_NUMBER=5

Oct 29 '25 17:10 danielaskdd

When GPU memory is insufficient, the Ollama server will repeatedly unload and reload both the embedding and LLM models, severely degrading performance. We therefore recommend using vLLM or sglang for model deployment to achieve better efficiency and stability.

Oct 29 '25 17:10 danielaskdd