ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: The VRAM usage when running the model with ragflow is much higher than when running it with ollama run qwen3:14b.

Open 2731557667 opened this issue 7 months ago • 4 comments

Self Checks

  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (Language Policy).
  • [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • [x] Please do not modify this template :) and fill in all the required fields.

Describe your problem

My original version 0.17 used 14GB of VRAM when calling qwen3:14b. After upgrading to ragflow 0.18, calling qwen3:14b used 21GB of VRAM. After upgrading to ragflow 0.19, calling qwen3:14b used 49GB of VRAM.

However, when I run ollama run qwen3:14b from the command line, it always only uses 14GB of VRAM.

Image

Image

2731557667 avatar May 26 '25 09:05 2731557667

Based on the information available, there are a few key factors that could explain the increased VRAM usage in ragflow versions 0.18 and 0.19 compared to version 0.17 and to running qwen3:14b directly with ollama run:

  • Version changes and processing strategy: In ragflow 0.19, the document processing strategy was changed to batch processing, where all documents complete embedding before being sent to the next stage. This can lead to higher peak VRAM usage, especially if multiple documents are processed in parallel, as opposed to the previous approach of processing one document at a time. This change could explain the significant jump in VRAM usage you observed between versions 0.18 and 0.19 [source].

  • Built-in models and resource requirements: Ragflow includes built-in models for document structure parsing and other deep document understanding tasks, which can require more computational resources (including VRAM) than simply running a model with ollama run. This is a known reason why ragflow may use more resources than other projects or direct model invocation [source].

  • Model reload issue in 0.18: There is a known issue in ragflow 0.18.0 where the model may reload with every request when used with Ollama, which could also contribute to increased VRAM usage due to repeated allocations [source].

  • No direct technical comparison: There is no specific documentation or technical comparison available that details the differences in VRAM usage between ragflow and ollama run. However, the extra features and processing steps in ragflow are likely contributing factors.

Potential ways to reduce VRAM usage:

  • Use the "slim" version of the ragflow Docker image, which does not load local embedding models and therefore occupies less RAM and VRAM [source].
  • If possible, adjust batch sizes or parallelism settings in your ragflow configuration to limit the number of documents processed simultaneously, which may help control peak VRAM usage (though specific configuration options for this were not found in the available documentation).
  • If you are not using ragflow's advanced document parsing features, consider whether a more minimal setup or direct use of ollama run is sufficient for your needs.

No existing issues or discussions were found that directly address the VRAM usage discrepancy between ragflow and ollama run. If you need more granular control over resource usage, you may need to experiment with ragflow's configuration or monitor for future updates that address this concern.

To continue talking to Dosu, mention @dosu-bot.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar May 26 '25 09:05 dosubot[bot]

same problem

konn-submarine-bu avatar May 30 '25 09:05 konn-submarine-bu

Based on the information available, there are a few key factors that could explain the increased VRAM usage in ragflow versions 0.18 and 0.19 compared to version 0.17 and to running qwen3:14b directly with ollama run:

  • Version changes and processing strategy: In ragflow 0.19, the document processing strategy was changed to batch processing, where all documents complete embedding before being sent to the next stage. This can lead to higher peak VRAM usage, especially if multiple documents are processed in parallel, as opposed to the previous approach of processing one document at a time. This change could explain the significant jump in VRAM usage you observed between versions 0.18 and 0.19 [source].
  • Built-in models and resource requirements: Ragflow includes built-in models for document structure parsing and other deep document understanding tasks, which can require more computational resources (including VRAM) than simply running a model with ollama run. This is a known reason why ragflow may use more resources than other projects or direct model invocation [source].
  • Model reload issue in 0.18: There is a known issue in ragflow 0.18.0 where the model may reload with every request when used with Ollama, which could also contribute to increased VRAM usage due to repeated allocations [source].
  • No direct technical comparison: There is no specific documentation or technical comparison available that details the differences in VRAM usage between ragflow and ollama run. However, the extra features and processing steps in ragflow are likely contributing factors.

Potential ways to reduce VRAM usage:

  • Use the "slim" version of the ragflow Docker image, which does not load local embedding models and therefore occupies less RAM and VRAM [source].
  • If possible, adjust batch sizes or parallelism settings in your ragflow configuration to limit the number of documents processed simultaneously, which may help control peak VRAM usage (though specific configuration options for this were not found in the available documentation).
  • If you are not using ragflow's advanced document parsing features, consider whether a more minimal setup or direct use of ollama run is sufficient for your needs.

No existing issues or discussions were found that directly address the VRAM usage discrepancy between ragflow and ollama run. If you need more granular control over resource usage, you may need to experiment with ragflow's configuration or monitor for future updates that address this concern.

To continue talking to Dosu, mention @dosu-bot.

Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

It's completely useless and doesn't address my issue.

2731557667 avatar Jun 07 '25 02:06 2731557667

I have the same problem. I cannot use ragflow 0.19 unless this problem is fixed.

RNGMARTIN avatar Jun 12 '25 00:06 RNGMARTIN