LightRAG icon indicating copy to clipboard operation
LightRAG copied to clipboard

[Question]: Guvicorn errors raise two questions

Open ntsarb opened this issue 7 months ago • 4 comments

Do you need to ask a question?

  • [x] I have searched the existing question and discussions and this question is not already answered.
  • [x] I believe this is a legitimate question, not just a bug or feature request.

Your Question

Hello, I imported several text documents and launched the server using:

lightrag-gunicorn --workers 4

The process ingested the first text file (there was an unhandled race condition preventing the parallel ingestion of two documents), it proceeded to the deduplication steps and completed this successfully.

Then, it ingested two text files in parallel, but after the deduplication steps it stopped with an error message suggesting it may have run out of memory. (I presume system RAM, 180GB allocated via WSL2, mostly unused).

After that, when I attempt to re-run the server in the same way, I get the below errors.

Note that I have since tested running the server with lightrag-server --workers 4 and this appears to be working (currently running), but it is re-running the entities+relationships extraction for the documents I had already processed and this took several hours (wasted).

Hence, this is to raise two questions:

  1. Is it possible to figure out what causes the server to misbehave and handle the relevant exceptions in a graceful manner, so that no data is lost?

  2. Does it make sense to get the lightRAG process to save intermediate outputs (which can be costly to produce) on the SSD before proceeding with the next step?

Please let me know if there is additional information I can provide to help resolve this.

Additional Context

(lightrag) ntsarb@myhostname:~/LightRAG$ lightrag-gunicorn --workers 4 2025-05-16 15:11:21 - pipmaster.package_manager - INFO - Targeting pip associated with Python: /usr/bin/python3 | Command base: /usr/bin/python3 -m pip

╔══════════════════════════════════════════════════════════════╗
║                  🚀 LightRAG Server v1.3.7/0170              ║
║          Fast, Lightweight RAG Server Implementation         ║
╚══════════════════════════════════════════════════════════════╝

📡 Server Configuration: ├─ Host: 0.0.0.0 ├─ Port: 9621 ├─ Workers: 4 ├─ CORS Origins: * ├─ SSL Enabled: False ├─ Ollama Emulating Model: lightrag:latest ├─ Log Level: INFO ├─ Verbose Debug: False ├─ History Turns: 3 ├─ API Key: Not Set └─ JWT Auth: Disabled

📂 Directory Configuration: ├─ Working Directory: /home/ntsarb/LightRAG/rag_storage └─ Input Directory: /home/ntsarb/LightRAG/inputs

🤖 LLM Configuration: ├─ Binding: ollama ├─ Host: http://localhost:11434 ├─ Model: llama3.3:70b-instruct-q8_0 ├─ Temperature: 0.2 ├─ Max Async for LLM: 4 ├─ Max Tokens: 32768 ├─ Timeout: None (infinite) ├─ LLM Cache Enabled: True └─ LLM Cache for Extraction Enabled: True

📊 Embedding Configuration: ├─ Binding: ollama ├─ Host: http://localhost:11434 ├─ Model: bge-m3:latest └─ Dimensions: 1024

⚙️ RAG Configuration: ├─ Summary Language: English ├─ Max Parallel Insert: 2 ├─ Max Embed Tokens: 8192 ├─ Chunk Size: 1200 ├─ Chunk Overlap Size: 100 ├─ Cosine Threshold: 0.2 ├─ Top-K: 60 ├─ Max Token Summary: 500 └─ Force LLM Summary on Merge: 6

💾 Storage Configuration: ├─ KV Storage: JsonKVStorage ├─ Vector Storage: NanoVectorDBStorage ├─ Graph Storage: NetworkXStorage └─ Document Status Storage: JsonDocStatusStorage

✨ Server starting up...

🌐 Server Access Information: ├─ WebUI (local): http://localhost:9621 ├─ Remote Access: http://:9621 ├─ API Documentation (local): http://localhost:9621/docs └─ Alternative Documentation (local): http://localhost:9621/redoc

📝 Note: Since the server is running on 0.0.0.0: - Use 'localhost' or '127.0.0.1' for local access - Use your machine's IP address for remote access - To find your IP address: • Windows: Run 'ipconfig' in terminal • Linux/Mac: Run 'ifconfig' or 'ip addr' in terminal

🚀 Starting LightRAG with Gunicorn 🔄 Worker management: Gunicorn (workers=4) 🔍 Preloading app: Enabled 📝 Note: Using Gunicorn's preload feature for shared data initialization

================================================================================ MAIN PROCESS INITIALIZATION Process ID: 783 Workers setting: 4

INFO: Process 783 Shared-Data created for Multiple Process (workers=4)

Starting Gunicorn with direct Python API... INFO: Process 783 Shared-Data already initialized (multiprocess=True) 2025-05-16 15:11:24,362 [INFO] lightrag: Loaded graph from /home/ntsarb/LightRAG/rag_storage/graph_chunk_entity_relation.graphml with 131 nodes, 124 edges 2025-05-16 15:11:24,374 [INFO] nano-vectordb: Load (131, 1024) data 2025-05-16 15:11:24,375 [INFO] nano-vectordb: Init {'embedding_dim': 1024, 'metric': 'cosine', 'storage_file': '/home/ntsarb/LightRAG/rag_storage/vdb_entities.json'} 131 data 2025-05-16 15:11:24,380 [INFO] nano-vectordb: Load (124, 1024) data 2025-05-16 15:11:24,380 [INFO] nano-vectordb: Init {'embedding_dim': 1024, 'metric': 'cosine', 'storage_file': '/home/ntsarb/LightRAG/rag_storage/vdb_relationships.json'} 124 data 2025-05-16 15:11:24,381 [INFO] nano-vectordb: Load (12, 1024) data 2025-05-16 15:11:24,381 [INFO] nano-vectordb: Init {'embedding_dim': 1024, 'metric': 'cosine', 'storage_file': '/home/ntsarb/LightRAG/rag_storage/vdb_chunks.json'} 12 data 2025-05-16 15:11:24,430 [INFO] gunicorn.error: Starting gunicorn 23.0.0

================================================================================ GUNICORN MASTER PROCESS: on_starting jobs for 4 worker(s) Process ID: 783

Memory usage after initialization: 180.19 MB LightRAG log file: /home/ntsarb/LightRAG/lightrag.log

Gunicorn initialization complete, forking workers...

2025-05-16 15:11:24,443 [INFO] gunicorn.error: Listening at: http://0.0.0.0:9621 (783) 2025-05-16 15:11:24,443 [INFO] gunicorn.error: Using worker: uvicorn.workers.UvicornWorker 2025-05-16 15:11:24,447 [INFO] gunicorn.error: Booting worker with pid: 843 INFO: Process 843 initialized updated flags for namespace: [full_docs] INFO: Process 843 ready to initialize storage namespace: [full_docs] INFO: Process 843 KV load full_docs with 1 records INFO: Process 843 initialized updated flags for namespace: [text_chunks] INFO: Process 843 ready to initialize storage namespace: [text_chunks] INFO: Process 843 KV load text_chunks with 12 records INFO: Process 843 initialized updated flags for namespace: [entities] INFO: Process 843 initialized updated flags for namespace: [relationships] INFO: Process 843 initialized updated flags for namespace: [chunks] INFO: Process 843 initialized updated flags for namespace: [chunk_entity_relation] INFO: Process 843 initialized updated flags for namespace: [llm_response_cache] INFO: Process 843 ready to initialize storage namespace: [llm_response_cache] INFO: Process 843 KV load llm_response_cache with 30 records 2025-05-16 15:11:24,542 [INFO] gunicorn.error: Booting worker with pid: 923 INFO: Process 843 initialized updated flags for namespace: [doc_status] INFO: Process 843 ready to initialize storage namespace: [doc_status] INFO: Process 843 doc status load doc_status with 7 records INFO: Process 923 storage namespace already initialized: [full_docs] INFO: Process 843 Pipeline namespace initialized INFO: Process 923 storage namespace already initialized: [text_chunks]

Server is ready to accept connections! 🚀

2025-05-16 15:11:24,628 [INFO] gunicorn.error: Booting worker with pid: 971 INFO: Process 923 storage namespace already initialized: [llm_response_cache] INFO: Process 923 storage namespace already initialized: [doc_status] INFO: Process 971 storage namespace already initialized: [full_docs] INFO: Process 971 storage namespace already initialized: [text_chunks]

Server is ready to accept connections! 🚀

INFO: Process 971 storage namespace already initialized: [llm_response_cache] INFO: Process 971 storage namespace already initialized: [doc_status]

Server is ready to accept connections! 🚀

2025-05-16 15:11:24,715 [INFO] gunicorn.error: Booting worker with pid: 1046 INFO: Process 1046 storage namespace already initialized: [full_docs] INFO: Process 1046 storage namespace already initialized: [text_chunks] INFO: Process 1046 storage namespace already initialized: [llm_response_cache] INFO: Process 1046 storage namespace already initialized: [doc_status]

Server is ready to accept connections! 🚀

INFO: 127.0.0.1:41512 - "POST /documents/scan HTTP/1.1" 200 INFO: Found 7 new files to index. INFO: No new unique documents were found. INFO: Successfully fetched and enqueued file: overview.txt INFO: No new unique documents were found. INFO: Successfully fetched and enqueued file: analysis2.txt INFO: No new unique documents were found. INFO: Successfully fetched and enqueued file: remedies1.txt INFO: No new unique documents were found. INFO: Successfully fetched and enqueued file: remedies2.txt INFO: No new unique documents were found. INFO: Successfully fetched and enqueued file: quality.txt INFO: No new unique documents were found. INFO: Successfully fetched and enqueued file: volume-6.txt INFO: No new unique documents were found. INFO: Successfully fetched and enqueued file: instrumentation.txt INFO: Processing 6 document(s) INFO: Extracting stage 1/6: analysis2.txt INFO: Processing d-id: doc-3a627fb560124c574869c239518f0d22 INFO: Extracting stage 2/6: remedies1.txt INFO: Processing d-id: doc-fc4a2be5501319f238edcb86bd491f4e INFO: limit_async: 16 new workers initialized INFO: limit_async: 4 new workers initialized 2025-05-16 15:12:26,901 [CRITICAL] gunicorn.error: WORKER TIMEOUT (pid:1046) 2025-05-16 15:12:27,910 [ERROR] gunicorn.error: Worker (pid:1046) was sent SIGKILL! Perhaps out of memory? 2025-05-16 15:12:27,914 [INFO] gunicorn.error: Booting worker with pid: 1523 INFO: Process 1523 storage namespace already initialized: [full_docs] INFO: Process 1523 storage namespace already initialized: [text_chunks] INFO: Process 1523 storage namespace already initialized: [llm_response_cache] INFO: Process 1523 storage namespace already initialized: [doc_status]

Server is ready to accept connections! 🚀

ntsarb avatar May 16 '25 14:05 ntsarb

  1. When multiple files are uploaded, once the first file upload is complete, the server immediately initiates a processing job, resulting in subsequent files being queued until the initial processing job is finished.

  2. I'm not certain whether WSL2 supports Gunicorn. It is recommended to run Gunicorn mode directly in a native Linux environment for better compatibility.

  3. LightRAG enables LLM caching by default, which significantly accelerates the reprocessing of previously failed files compared to the initial processing.

danielaskdd avatar May 17 '25 01:05 danielaskdd

I noticed that you are using a local Ollama as your LLM. LightRAG's default context window size is 32K, which is much larger than Ollama's default of 2K. This will cause Ollama's GPU memory usage to increase significantly, and the token output speed will also decrease sharply. In addition, we recommend that the parameter size of the LLM should not be less than 32B. You need to make sure that your server can support running an LLM with 32B parameters and a 32K context window. Otherwise, it is recommended to use an external LLM API with LightRAG to ensure reliable RAG results.

danielaskdd avatar May 17 '25 01:05 danielaskdd

I noticed that you are using a local Ollama as your LLM. LightRAG's default context window size is 32K, which is much larger than Ollama's default of 2K. This will cause Ollama's GPU memory usage to increase significantly, and the token output speed will also decrease sharply. In addition, we recommend that the parameter size of the LLM should not be less than 32B. You need to make sure that your server can support running an LLM with 32B parameters and a 32K context window. Otherwise, it is recommended to use an external LLM API with LightRAG to ensure reliable RAG results.

Thank you, this is useful. The model I am testing with is the "llama3.3:70b-instruct-q8_0", which does have a 2K context window by default. I've now set the num_ctx parameter to 32768 and saved with a new model name, so Ollama will launch it with 32K context window.

From your tests, have you identified a particular platform/server that delivers best LLM performance (without compromising stability) for lightRAG?

ntsarb avatar May 17 '25 06:05 ntsarb

Thanks again!

  1. When multiple files are uploaded, once the first file upload is complete, the server immediately initiates a processing job, resulting in subsequent files being queued until the initial processing job is finished.

Sometimes it ingests 2 documents in parallel, other times it ingests only 1 document (with the rest being held in the queue), due to a race condition; I don't remember the exact message but it was handled gracefully, which is great. If this is worth investigating, let me know what info you need and I can provide it.

2. I'm not certain whether WSL2 supports Gunicorn. It is recommended to run Gunicorn mode directly in a native Linux environment for better compatibility.

At this stage, WSL2 is the only available option and I haven't tested Gunicorn on WSL2 before so, thanks for raising it, I need to look into this more carefully.

3. LightRAG enables LLM caching by default, which significantly accelerates the reprocessing of previously failed files compared to the initial processing.

I don't think the llm cache was used in this case, not sure why.

ntsarb avatar May 17 '25 07:05 ntsarb