ragflow [Bug]: The documents parsing are complete, but the memory is not released.

Is there an existing issue for the same bug?

[X] I have checked the existing issues.

RAGFlow workspace code commit ID

0.14.1

RAGFlow image version

0.14.1

Other environment information

Ubuntu 24.04

Actual behavior

After uploading a PDF documents for parsing and embedding, the memory usage only increases without decreasing.

Expected behavior

No response

Steps to reproduce

Upload several large-volume PDF documents.

Additional information

No response

Dec 13 '24 11:12 danny-zhu

Do you have an estimate on how much (k/M)bytes it "leaks" per parsed document?

Dec 13 '24 14:12 Snify89

What about using a SAAS embedding model or Ollama/Xinference served embedding model?

Dec 16 '24 01:12 KevinHuSh

Do you have an estimate on how much (k/M)bytes it "leaks" per parsed document?

The PDF documents I uploaded are approximately 40-50 MB each one. The process of embedding one PDF document costs around 40 GB of memory.

Dec 16 '24 10:12 danny-zhu

What about using a SAAS embedding model or Ollama/Xinference served embedding model?

Memory is still not released.

Dec 16 '24 10:12 danny-zhu

I noticed the same today, I set up ragflow in a cloud instance of 16GB of RAM and I noticed it was not enough for ingestion for ingestion ~15 pdf of 10-20 pages, the RAM was already around 20GB and not released. This is still the case with the master. When you reach the RAM, for some reason the program don't crash and the swap takes the relay and nothing respond anymore... This is a critical issue that will hinder the production's ragflow deployment.

Dec 20 '24 21:12 ODAncona

There is a method that can temporarily resolve the memory overflow caused by Ragflow during the document embedding process, but the memory leak is still inevitable. Set the environment variable TRACE_MALLOC_DELTA to 1, or modify it directly in the code like this: TRACE_MALLOC_DELTA = int(os.environ.get('TRACE_MALLOC_DELTA', "1")). After making this setting, the memory will not increase indefinitely during the document embedding process, but the memory that has been used will still not be automatically released after the task document embedding is completed.

Dec 24 '24 07:12 danny-zhu

Any updates on this? thanks! The issue persists even when using external embedding model. So I guess it's within the OCR or other steps?

Jan 15 '25 02:01 stan-wang-analycia

I also encountered this problem. Parsing a 10-page PDF took up about 16G of memory, and then it became unresponsive and kept restarting the container.

Jan 23 '25 09:01 TraceIvan

Could you test this with/without OCR layer embedded in the file? Is it PDF only? Does it also occur when using deepdoc (standalone) only?

Jan 23 '25 12:01 Snify89

same. the memory will not be released after embedding completion, even using embedding api services.

Feb 14 '25 07:02 chminsc

Memory is still not released, help me , version 0.15.0

Feb 24 '25 04:02 luongphambao