Do you need to file an issue?

[x] I have searched the existing issues and this bug is not already filed.
[x] I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

Followed Lightrag server install instructions. Installed raganything and libreoffice. Used the scan function of Lightrag server webui to ingest files like pptx, word, and pdf. The pptx causes an error during scanning. Data from images in the PDF or Word document does not appear to be OCR'd and parsed.

Steps to reproduce

Install and Lightrag-hku[api] from pypi. install raganything[all]. install libreoffice. run lightrag-gunicorn. put pptx, word, pdf files inside the input directory. run the scan function from the lightrag-server webui.

Expected Behavior

PPTX files are indexed without errors. images in word and pdf files are ocr'd and data is ingested.

LightRAG Config Used

###########################

Server Configuration

########################### HOST=0.0.0.0 PORT=9621 WEBUI_TITLE='VALi Knowledge Base' WEBUI_DESCRIPTION="LCM Validation Graph RAG System" OLLAMA_EMULATING_MODEL_TAG=latest WORKERS=4

CORS_ORIGINS=http://localhost:3000,http://localhost:8080

Optional SSL Configuration

SSL=true

SSL_CERTFILE=/path/to/cert.pem

SSL_KEYFILE=/path/to/key.pem

Directory Configuration (defaults to current working directory)

Default value is ./inputs and ./rag_storage

INPUT_DIR=/mnt/smbshare WORKING_DIR=./VALI_DB

Max nodes return from grap retrieval in webui

MAX_GRAPH_NODES=1000

Logging level

LOG_LEVEL=INFO

VERBOSE=False

LOG_MAX_BYTES=10485760

LOG_BACKUP_COUNT=5

Logfile location (defaults to current working directory)

LOG_DIR=/path/to/log/directory

#####################################

Login and API-Key Configuration

#####################################

AUTH_ACCOUNTS='admin:admin123,user1:pass456'

TOKEN_SECRET=Your-Key-For-LightRAG-API-Server

TOKEN_EXPIRE_HOURS=48

GUEST_TOKEN_EXPIRE_HOURS=24

JWT_ALGORITHM=HS256

API-Key to access LightRAG Server API

LIGHTRAG_API_KEY=sdfasdf WHITELIST_PATHS=/health,/api/*

########################

Query Configuration

########################

LLM response cache for query (Not valid for streaming response)

ENABLE_LLM_CACHE=true

HISTORY_TURNS=0

COSINE_THRESHOLD=0.2

Number of entities or relations retrieved from KG

TOP_K=40

Maxmium number or chunks plan to send to LLM

CHUNK_TOP_K=10

control the actual enties send to LLM

MAX_ENTITY_TOKENS=10000

control the actual relations send to LLM

MAX_RELATION_TOKENS=10000

control the maximum tokens send to LLM (include entities, raltions and chunks)

MAX_TOTAL_TOKENS=32000

maximum related chunks grab from single entity or relations

RELATED_CHUNK_NUMBER=10

Reranker configuration (Set ENABLE_RERANK to true in reranking model is configed)

ENABLE_RERANK=False RERANK_MODEL=text-embedding-bge-reranker-v2-m3 RERANK_BINDING_HOST=http://localhost:1234/v1/embeddings RERANK_BINDING_API_KEY=lmstudio

########################################

Document processing configuration

########################################

Language: English, Chinese, French, German ...

SUMMARY_LANGUAGE=English ENABLE_LLM_CACHE_FOR_EXTRACT=true

MAX_TOKENS: max tokens send to LLM for entity relation summaries (less than context size of the model)

MAX_TOKENS=32000

Chunk size for document splitting, 500~1500 is recommended

CHUNK_SIZE=1200 CHUNK_OVERLAP_SIZE=50

Entity and relation summarization configuration

Number of duplicated entities/edges to trigger LLM re-summary on merge ( at least 3 is recommented)

FORCE_LLM_SUMMARY_ON_MERGE=4

Maximum number of entity extraction attempts for ambiguous content

MAX_GLEANING=1

###############################

Concurrency Configuration

###############################

Max concurrency requests of LLM (for both query and document processing)

MAX_ASYNC=4

Number of parallel processing documents(between 2~10, MAX_ASYNC/4 is recommended)

MAX_PARALLEL_INSERT=2

Max concurrency requests for Embedding

EMBEDDING_FUNC_MAX_ASYNC=8

Num of chunks send to Embedding in single request

EMBEDDING_BATCH_NUM=10

#######################

LLM Configuration

#######################

Time out in seconds for LLM, None for infinite timeout

TIMEOUT=240

Some models like o1-mini require temperature to be set to 1

TEMPERATURE=1

LLM Binding type: openai, ollama, lollms, azure_openai

LLM_BINDING=azure_openai LLM_MODEL=o4-mini LLM_BINDING_HOST=adsfasdf LLM_BINDING_API_KEY=asdfadf

Set as num_ctx option for Ollama LLM

OLLAMA_NUM_CTX=32768

Optional for Azure

AZURE_OPENAI_API_VERSION=2024-12-01-preview #AZURE_OPENAI_DEPLOYMENT=gpt-4o

####################################################################################

Embedding Configuration (Should not be changed after the first file processed)

####################################################################################

Embedding type: openai, ollama, lollms, azure_openai

EMBEDDING_BINDING=azure_openai EMBEDDING_MODEL=text-embedding-3-large EMBEDDING_DIM=3072 EMBEDDING_BINDING_API_KEY=dfasdfsd

If the embedding service is deployed within the same Docker stack, use host.docker.internal instead of localhost

EMBEDDING_BINDING_HOST=dfasdf

Maximum tokens sent to Embedding for each chunk (no longer in use?)

MAX_EMBED_TOKENS=8192

Optional for Azure

AZURE_EMBEDDING_DEPLOYMENT=text-embedding-3-large AZURE_EMBEDDING_API_VERSION=2024-10-21 AZURE_EMBEDDING_ENDPOINT=asdfasdf AZURE_EMBEDDING_API_KEY=asfasf

############################

Data storage selection

############################

Default storage (Recommended for small scale deployment)

LIGHTRAG_KV_STORAGE=JsonKVStorage

LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage

LIGHTRAG_GRAPH_STORAGE=NetworkXStorage

LIGHTRAG_VECTOR_STORAGE=NanoVectorDBStorage

Redis Storage (Recommended for production deployment)

LIGHTRAG_KV_STORAGE=RedisKVStorage

LIGHTRAG_DOC_STATUS_STORAGE=RedisDocStatusStorage

Vector Storage (Recommended for production deployment)

LIGHTRAG_VECTOR_STORAGE=MilvusVectorDBStorage

LIGHTRAG_VECTOR_STORAGE=QdrantVectorDBStorage

LIGHTRAG_VECTOR_STORAGE=FaissVectorDBStorage

Graph Storage (Recommended for production deployment)

LIGHTRAG_GRAPH_STORAGE=Neo4JStorage

LIGHTRAG_GRAPH_STORAGE=MemgraphStorage

PostgreSQL

LIGHTRAG_KV_STORAGE=PGKVStorage LIGHTRAG_DOC_STATUS_STORAGE=PGDocStatusStorage

LIGHTRAG_GRAPH_STORAGE=PGGraphStorage

LIGHTRAG_VECTOR_STORAGE=PGVectorStorage

MongoDB (Vector storage only available on Atlas Cloud)

LIGHTRAG_KV_STORAGE=MongoKVStorage

LIGHTRAG_DOC_STATUS_STORAGE=MongoDocStatusStorage

LIGHTRAG_GRAPH_STORAGE=MongoGraphStorage

LIGHTRAG_VECTOR_STORAGE=MongoVectorDBStorage

####################################################################

WORKSPACE setting workspace name for all storage types

in the purpose of isolating data from LightRAG instances.

Valid workspace name constraints: a-z, A-Z, 0-9, and _

#################################################################### WORKSPACE=VALKB

PostgreSQL Configuration

POSTGRES_HOST=sgnt-dev-34sd POSTGRES_PORT=5432 POSTGRES_USER=val_svc POSTGRES_PASSWORD=val123 POSTGRES_DATABASE=val_lightrag_graph POSTGRES_MAX_CONNECTIONS=12

POSTGRES_WORKSPACE=forced_workspace_name

Neo4j Configuration

NEO4J_URI=neo4j://sgnt-dev-sdfsd:7687 NEO4J_USERNAME=neo4j NEO4J_PASSWORD=snowbound-single-blinks NEO4J_MAX_CONNECTION_POOL_SIZE=100 NEO4J_CONNECTION_TIMEOUT=30 NEO4J_CONNECTION_ACQUISITION_TIMEOUT=30 MAX_TRANSACTION_RETRY_TIME=30

NEO4J_WORKSPACE=forced_workspace_name

MongoDB Configuration

MONGO_URI=mongodb://root:root@localhost:27017/ #MONGO_URI=mongodb+srv://xxxx MONGO_DATABASE=LightRAG

MONGODB_WORKSPACE=forced_workspace_name

Milvus Configuration

MILVUS_URI=http://localhost:19530 MILVUS_DB_NAME=lightrag

MILVUS_USER=root

MILVUS_PASSWORD=your_password

MILVUS_TOKEN=your_token

MILVUS_WORKSPACE=forced_workspace_name

Qdrant

QDRANT_URL=http://localhost:6333

QDRANT_API_KEY=your-api-key

QDRANT_WORKSPACE=forced_workspace_name

Redis

REDIS_URI=redis://localhost:6379 REDIS_SOCKET_TIMEOUT=30 REDIS_CONNECT_TIMEOUT=10 REDIS_MAX_CONNECTIONS=100 REDIS_RETRY_ATTEMPTS=3

REDIS_WORKSPACE=forced_workspace_name

Memgraph Configuration

MEMGRAPH_URI=bolt://localhost:7687 MEMGRAPH_USERNAME= MEMGRAPH_PASSWORD= MEMGRAPH_DATABASE=memgraph

MEMGRAPH_WORKSPACE=forced_workspace_name

Logs and screenshots

No response

Additional Information

LightRAG Version: 1.4.5
Operating System: RHEL 9
Python Version: 3.13.5
Related Issues:

Aug 03 '25 07:08 BireleyX

RagAnything has not yet been integrated with the LightRAG Server and currently only functions with the sample code.

Aug 03 '25 18:08 danielaskdd

had a similar issue before — sometimes the LightRAG backend doesn’t properly propagate the request headers or payload structure Raganything expects.

quick checks worth trying:

make sure the API server isn't enforcing a token signature mismatch — seen this silently fail before.
check if Raganything is expecting application/json but LightRAG defaults to form-encoded payloads (or vice versa).
CORS headers from LightRAG sometimes drop under nginx or reverse proxy; can affect auth handshake or preflight behavior.

also, double-check if the endpoints are hardcoded inside Raganything or if it’s reading from config — you might be hitting the wrong internal route.

happy to help debug more if you’ve got logs from the failed POSTs.

Aug 06 '25 02:08 onestardao

@onestardao thanks for the response. for now, I'll just wait for raganything to be integrated into the server.

Aug 13 '25 01:08 BireleyX

Thanks for the update — understood that you’re waiting for Raganything to be integrated with LightRAG Server. By the way, I’ve been compiling a list of similar integration and compatibility issues across various setups. If you’d like, I can share that list so you can check whether any of the known cases match your current situation. Would that be useful for you?

Aug 13 '25 04:08 onestardao

I've been trying to do the same thing (integrating RAG-Anything into the LightRAG server (API), but unsuccessfully), and I had similar issues as the author. @onestardao, If you have some guidelines that could help, it would be much appreciated.

Aug 15 '25 15:08 Khertys

Thanks for the update — understood that you’re waiting for Raganything to be integrated with LightRAG Server. By the way, I’ve been compiling a list of similar integration and compatibility issues across various setups. If you’d like, I can share that list so you can check whether any of the known cases match your current situation. Would that be useful for you?

sure, please do. appreciate it!

Aug 15 '25 17:08 BireleyX

You don’t need to change your infra for this — it’s a Semantic Firewall pattern (MIT-licensed) that can sit between RagAnything and the LightRAG server to normalize requests/responses without touching server code.

Ref: Problem Map No5 – Semantic ≠ Embedding

The idea is to intercept and validate at the semantic layer before hitting the backend, so it works even if integration timing or payload structures are inconsistent.

Aug 16 '25 03:08 onestardao

Same issue - tracking

Aug 18 '25 21:08 salja03-t21

Same issue here. If you could give this integration priority, it would be greatly appreciated, as it really makes a lot of sense.

Aug 27 '25 07:08 kulkovsky

Same issue here, posted as well on the RAG-Anything forum: https://github.com/HKUDS/RAG-Anything/issues/109

Sep 10 '25 13:09 sunshineinsandiego

Are there any plans to integrate RAG-Anything into LightRAG server? It seems to be requested by many

Oct 22 '25 11:10 luzmkk

I think this is crucial and would love to see an intergration.

Nov 18 '25 12:11 defaultdigital1

Really looking forward to the merger of raganything and the lightrag server

Nov 19 '25 10:11 liwu96

[Bug]: Raganything not working with Lightrag server

Do you need to file an issue?

Describe the bug

Steps to reproduce

Expected Behavior

LightRAG Config Used

Server Configuration

CORS_ORIGINS=http://localhost:3000,http://localhost:8080

Optional SSL Configuration

SSL=true

SSL_CERTFILE=/path/to/cert.pem

SSL_KEYFILE=/path/to/key.pem

Directory Configuration (defaults to current working directory)

Default value is ./inputs and ./rag_storage

Max nodes return from grap retrieval in webui

MAX_GRAPH_NODES=1000

Logging level

LOG_LEVEL=INFO

VERBOSE=False

LOG_MAX_BYTES=10485760

LOG_BACKUP_COUNT=5

Logfile location (defaults to current working directory)

LOG_DIR=/path/to/log/directory

Login and API-Key Configuration

AUTH_ACCOUNTS='admin:admin123,user1:pass456'

TOKEN_SECRET=Your-Key-For-LightRAG-API-Server

TOKEN_EXPIRE_HOURS=48

GUEST_TOKEN_EXPIRE_HOURS=24

JWT_ALGORITHM=HS256

API-Key to access LightRAG Server API

Query Configuration

LLM response cache for query (Not valid for streaming response)

HISTORY_TURNS=0

COSINE_THRESHOLD=0.2

Number of entities or relations retrieved from KG

Maxmium number or chunks plan to send to LLM

control the actual enties send to LLM

MAX_ENTITY_TOKENS=10000

control the actual relations send to LLM

MAX_RELATION_TOKENS=10000

control the maximum tokens send to LLM (include entities, raltions and chunks)

MAX_TOTAL_TOKENS=32000

maximum related chunks grab from single entity or relations

RELATED_CHUNK_NUMBER=10

Reranker configuration (Set ENABLE_RERANK to true in reranking model is configed)

Document processing configuration

Language: English, Chinese, French, German ...

MAX_TOKENS: max tokens send to LLM for entity relation summaries (less than context size of the model)

Chunk size for document splitting, 500~1500 is recommended

Entity and relation summarization configuration

Number of duplicated entities/edges to trigger LLM re-summary on merge ( at least 3 is recommented)

Maximum number of entity extraction attempts for ambiguous content

Concurrency Configuration

Max concurrency requests of LLM (for both query and document processing)

Number of parallel processing documents(between 2~10, MAX_ASYNC/4 is recommended)

Max concurrency requests for Embedding

Num of chunks send to Embedding in single request

LLM Configuration

Time out in seconds for LLM, None for infinite timeout

Some models like o1-mini require temperature to be set to 1

LLM Binding type: openai, ollama, lollms, azure_openai

Set as num_ctx option for Ollama LLM

OLLAMA_NUM_CTX=32768

Optional for Azure

Embedding Configuration (Should not be changed after the first file processed)

Embedding type: openai, ollama, lollms, azure_openai

If the embedding service is deployed within the same Docker stack, use host.docker.internal instead of localhost

Maximum tokens sent to Embedding for each chunk (no longer in use?)

Optional for Azure

Data storage selection

Default storage (Recommended for small scale deployment)

LIGHTRAG_KV_STORAGE=JsonKVStorage

LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage

LIGHTRAG_GRAPH_STORAGE=NetworkXStorage

LIGHTRAG_VECTOR_STORAGE=NanoVectorDBStorage

Redis Storage (Recommended for production deployment)

LIGHTRAG_KV_STORAGE=RedisKVStorage

LIGHTRAG_DOC_STATUS_STORAGE=RedisDocStatusStorage

Vector Storage (Recommended for production deployment)

LIGHTRAG_VECTOR_STORAGE=MilvusVectorDBStorage