ragflow [Question]:

Self Checks

[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report (Language Policy).
[x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
[x] Please do not modify this template :) and fill in all the required fields.

Describe your problem

I currently have two 4090s. When I try to parse the document, it reports an error, and it indicates that the model parameters are placed on different GPUs. Could you please tell me where I can change this configuration? My single GPU has 24GB of VRAM, so it should be able to handle the operation on a single card.

log： 2025-03-31 11:22:45,078 INFO 28 set_progress(682b770c0ddf11f085710242ac130006), progress: -1, progress_msg: 11:22:45 [ERROR][Exception]: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1 2025-03-31 11:22:45,078 ERROR 28 handle_task got exception for task {"id": "682b770c0ddf11f085710242ac130006", "doc_id": "e0580954092311f096ae0242ac120006", "from_page": 336, "to_page": 341, "retry_count": 0, "kb_id": "cdd3a654092211f0979b0242ac120006", "parser_id": "naive", "parser_config": {"pages": [[1, 1000000]]}, "name": "05_RKNN_Compiler_Support_Operator_List_V2.3.0.pdf", "type": "pdf", "location": "05_RKNN_Compiler_Support_Operator_List_V2.3.0.pdf", "size": 3709706, "tenant_id": "099d0360089211f0ac5e0242ac120006", "language": "English", "embd_id": "BAAI/bge-large-zh-v1.5@BAAI", "pagerank": 0, "kb_parser_config": {"pages": [[1, 1000000]]}, "img2txt_id": "qwen-vl-plus@Tongyi-Qianwen", "asr_id": "", "llm_id": "qwen2.5:7b@Ollama", "update_time": 1743391362571, "task_type": ""} Traceback (most recent call last): File "/ragflow/rag/svr/task_executor.py", line 600, in handle_task await do_handle_task(task) File "/ragflow/rag/svr/task_executor.py", line 502, in do_handle_task vts, _ = embedding_model.encode(["ok"]) File "<@beartype(api.db.services.llm_service.LLMBundle.encode) at 0x77ceac5d5360>", line 31, in encode File "/ragflow/api/db/services/llm_service.py", line 237, in encode embeddings, used_tokens = self.mdl.encode(texts) File "<@beartype(rag.llm.embedding_model.DefaultEmbedding.encode) at 0x77cebb9b1cf0>", line 31, in encode File "/ragflow/rag/llm/embedding_model.py", line 104, in encode ress.extend(self._model.encode(texts[i:i + batch_size]).tolist()) File "/ragflow/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/ragflow/.venv/lib/python3.10/site-packages/FlagEmbedding/flag_models.py", line 96, in encode last_hidden_state = self.model(**inputs, return_dict=True).last_hidden_state File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 177, in forward raise RuntimeError( RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1

Mar 31 '25 03:03 fengshengting

Hello, @fengshengting! I'm here to help you with any bugs, questions, or contributions you have. Let's work together to resolve this issue.

The error you're encountering suggests that the model parameters and buffers are not consistently placed on the same GPU, which is causing a mismatch. To address this, you can check the device configuration in the rag/settings.py file. This file contains the following code that determines the number of available GPUs:

PARALLEL_DEVICES = None
try:
    import torch.cuda
    PARALLEL_DEVICES = torch.cuda.device_count()
    logging.info(f"found {PARALLEL_DEVICES} gpus")
except Exception:
    logging.info("can't import package 'torch'")

To ensure that the model runs on a single GPU, you might need to explicitly set the device for the model parameters and buffers. You can do this by specifying the device in your model code, typically using model.to('cuda:0') to move the model to the first GPU. Additionally, ensure that any data or inputs are also moved to the same device using input.to('cuda:0').

If the code uses torch.nn.parallel.DataParallel, you might need to adjust the device_ids parameter to ensure it only includes the desired GPU, like device_ids=[0].

These changes should help ensure that all model parameters and buffers are consistently placed on the same GPU, avoiding the mismatch error.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

Mar 31 '25 03:03 dosubot[bot]

What's the version of RAGFlow?

Mar 31 '25 07:03 KevinHuSh

Now I have changed the count in docker-compose-gpu.yml to 1. After that, I forced setting.py to only use one GPU, which is fine. However, I still want to use multiple GPUs. My version of RAGFlow CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0049e0123851 registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow:nightly "./entrypoint.sh" 44 minutes ago Up 44 minutes 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp, 0.0.0.0:9380->9380/tcp, :::9380->9380/tcp ragflow-server 2177a43be588 mysql:8.0.39 "docker-entrypoint.s…" 44 minutes ago Up 44 minutes (healthy) 33060/tcp, 0.0.0.0:5455->3306/tcp, :::5455->3306/tcp ragflow-mysql 54504f3cc315 quay.io/minio/minio:RELEASE.2023-12-20T01-00-02Z "/usr/bin/docker-ent…" 44 minutes ago Up 44 minutes 0.0.0.0:9000-9001->9000-9001/tcp, :::9000-9001->9000-9001/tcp ragflow-minio c280618e7efe elasticsearch:8.11.3 "/bin/tini -- /usr/l…" 44 minutes ago Up 43 minutes (healthy) 9300/tcp, 0.0.0.0:1200->9200/tcp, :::1200->9200/tcp ragflow-es-01 5aa76f5237d9 valkey/valkey:8 "docker-entrypoint.s…" 44 minutes ago Up 44 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp ragflow-redis

Mar 31 '25 07:03 fengshengting

I have same problem , using latest vesion v0.17.2

compose setting :

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["4", "5", "6", "7"]
              capabilities: [gpu]

task_executor error：

2025-04-03 18:14:09,602 INFO     37 set_progress(5f9064f8107411f0be790242ac130006), progress: -1, progress_msg: 18:14:09 Page(1~13): [ERROR]Fail to bind embedding model: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:7
2025-04-03 18:14:09,602 ERROR    37 Fail to bind embedding model: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:7
Traceback (most recent call last):
  File "/ragflow/rag/svr/task_executor.py", line 502, in do_handle_task
    vts, _ = embedding_model.encode(["ok"])
  File "<@beartype(api.db.services.llm_service.LLMBundle.encode) at 0x7f91945d7130>", line 31, in encode
  File "/ragflow/api/db/services/llm_service.py", line 222, in encode
    embeddings, used_tokens = self.mdl.encode(texts)
  File "<@beartype(rag.llm.embedding_model.YoudaoEmbed.encode) at 0x7f91a1798a60>", line 31, in encode
  File "/ragflow/rag/llm/embedding_model.py", line 368, in encode
    embds = YoudaoEmbed._client.encode(texts[i:i + batch_size])
  File "/ragflow/.venv/lib/python3.10/site-packages/BCEmbedding/models/embedding.py", line 94, in encode
    outputs = self.model(**inputs_on_device, return_dict=True)
  File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 177, in forward
    raise RuntimeError(
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:7

Apr 03 '25 12:04 flowerljl

same problem, updated with nightly version, still not work.

Apr 08 '25 03:04 zhengyi73

same problem with the nightly version

Apr 10 '25 03:04 caolonghao

It works well after changing the version from 0.18.0 to 0.17.2

May 11 '25 06:05 2020zyc

version： v0.20.4 full，This problem still exists。

2025-09-04 11:48:16,481 ERROR 33 handle_task got exception for task {"id": "f7954638894111f08d700242ac170006", "doc_id": "e8df8ca2894111f093580242ac170006", "from_page": 612, "to_page": 617, "retry_count": 0, "kb_id": "6b65315488a711f097ad0242ac120006", "parser_id": "manual", "parser_config": {"pages": [[1, 1000000]], "task_page_size": 12, "layout_recognize": "DeepDOC", "chunk_token_num": 512, "delimiter": "\n", "auto_keywords": 0, "auto_questions": 0, "html4excel": false, "raptor": {"use_raptor": false, "prompt": "\u8bf7\u603b\u7ed3\u4ee5\u4e0b\u6bb5\u843d\u3002 \u5c0f\u5fc3\u6570\u5b57\uff0c\u4e0d\u8981\u7f16\u9020\u3002 \u6bb5\u843d\u5982\u4e0b\uff1a\n {cluster_content}\n\u4ee5\u4e0a\u5c31\u662f\u4f60\u9700\u8981\u603b\u7ed3\u7684\u5185\u5bb9\u3002", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "graphrag": {}, "entity_types": []}, "name": "\u6574\u5408\u77e5\u8bc6\u5e93\uff08\u542b\u624b\u518c\uff09.pdf", "type": "pdf", "location": "\u6574\u5408\u77e5\u8bc6\u5e93\uff08\u542b\u624b\u518c\uff09.pdf", "size": 52332893, "tenant_id": "5dff8a8288a711f0a42f0242ac120006", "language": "English", "embd_id": "BAAI/bge-large-zh-v1.5@BAAI", "pagerank": 0, "kb_parser_config": {"pages": [[1, 1000000]]}, "img2txt_id": "qwen-vl-plus@Tongyi-Qianwen", "asr_id": "qwen-audio-asr@Tongyi-Qianwen", "llm_id": "qwen-plus@Tongyi-Qianwen", "update_time": 1756957686871, "task_type": ""} Traceback (most recent call last): File "/ragflow/rag/svr/task_executor.py", line 671, in handle_task await do_handle_task(task) File "/ragflow/api/utils/api_utils.py", line 693, in async_wrapper return await func(*args, **kwargs) File "/ragflow/rag/svr/task_executor.py", line 554, in do_handle_task vts, _ = embedding_model.encode(["ok"]) File "<@beartype(api.db.services.llm_service.LLMBundle.encode) at 0x7f891059e0e0>", line 31, in encode File "/ragflow/api/db/services/llm_service.py", line 99, in encode embeddings, used_tokens = self.mdl.encode(texts) File "<@beartype(rag.llm.embedding_model.DefaultEmbedding.encode) at 0x7f890c89f2e0>", line 31, in encode File "/ragflow/rag/llm/embedding_model.py", line 122, in encode ress = self._model.encode(texts[i : i + batch_size], convert_to_numpy=True) File "/ragflow/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/ragflow/.venv/lib/python3.10/site-packages/FlagEmbedding/flag_models.py", line 96, in encode last_hidden_state = self.model(**inputs, return_dict=True).last_hidden_state File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 178, in forward raise RuntimeError( RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:7

Sep 04 '25 05:09 muliu