[Question]:
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
Describe your problem
I currently have two 4090s. When I try to parse the document, it reports an error, and it indicates that the model parameters are placed on different GPUs. Could you please tell me where I can change this configuration? My single GPU has 24GB of VRAM, so it should be able to handle the operation on a single card.
log: 2025-03-31 11:22:45,078 INFO 28 set_progress(682b770c0ddf11f085710242ac130006), progress: -1, progress_msg: 11:22:45 [ERROR][Exception]: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1 2025-03-31 11:22:45,078 ERROR 28 handle_task got exception for task {"id": "682b770c0ddf11f085710242ac130006", "doc_id": "e0580954092311f096ae0242ac120006", "from_page": 336, "to_page": 341, "retry_count": 0, "kb_id": "cdd3a654092211f0979b0242ac120006", "parser_id": "naive", "parser_config": {"pages": [[1, 1000000]]}, "name": "05_RKNN_Compiler_Support_Operator_List_V2.3.0.pdf", "type": "pdf", "location": "05_RKNN_Compiler_Support_Operator_List_V2.3.0.pdf", "size": 3709706, "tenant_id": "099d0360089211f0ac5e0242ac120006", "language": "English", "embd_id": "BAAI/bge-large-zh-v1.5@BAAI", "pagerank": 0, "kb_parser_config": {"pages": [[1, 1000000]]}, "img2txt_id": "qwen-vl-plus@Tongyi-Qianwen", "asr_id": "", "llm_id": "qwen2.5:7b@Ollama", "update_time": 1743391362571, "task_type": ""} Traceback (most recent call last): File "/ragflow/rag/svr/task_executor.py", line 600, in handle_task await do_handle_task(task) File "/ragflow/rag/svr/task_executor.py", line 502, in do_handle_task vts, _ = embedding_model.encode(["ok"]) File "<@beartype(api.db.services.llm_service.LLMBundle.encode) at 0x77ceac5d5360>", line 31, in encode File "/ragflow/api/db/services/llm_service.py", line 237, in encode embeddings, used_tokens = self.mdl.encode(texts) File "<@beartype(rag.llm.embedding_model.DefaultEmbedding.encode) at 0x77cebb9b1cf0>", line 31, in encode File "/ragflow/rag/llm/embedding_model.py", line 104, in encode ress.extend(self._model.encode(texts[i:i + batch_size]).tolist()) File "/ragflow/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/ragflow/.venv/lib/python3.10/site-packages/FlagEmbedding/flag_models.py", line 96, in encode last_hidden_state = self.model(**inputs, return_dict=True).last_hidden_state File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 177, in forward raise RuntimeError( RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1
Hello, @fengshengting! I'm here to help you with any bugs, questions, or contributions you have. Let's work together to resolve this issue.
The error you're encountering suggests that the model parameters and buffers are not consistently placed on the same GPU, which is causing a mismatch. To address this, you can check the device configuration in the rag/settings.py file. This file contains the following code that determines the number of available GPUs:
PARALLEL_DEVICES = None
try:
import torch.cuda
PARALLEL_DEVICES = torch.cuda.device_count()
logging.info(f"found {PARALLEL_DEVICES} gpus")
except Exception:
logging.info("can't import package 'torch'")
To ensure that the model runs on a single GPU, you might need to explicitly set the device for the model parameters and buffers. You can do this by specifying the device in your model code, typically using model.to('cuda:0') to move the model to the first GPU. Additionally, ensure that any data or inputs are also moved to the same device using input.to('cuda:0').
If the code uses torch.nn.parallel.DataParallel, you might need to adjust the device_ids parameter to ensure it only includes the desired GPU, like device_ids=[0].
These changes should help ensure that all model parameters and buffers are consistently placed on the same GPU, avoiding the mismatch error.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
What's the version of RAGFlow?
Now I have changed the count in docker-compose-gpu.yml to 1. After that, I forced setting.py to only use one GPU, which is fine. However, I still want to use multiple GPUs. My version of RAGFlow CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0049e0123851 registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow:nightly "./entrypoint.sh" 44 minutes ago Up 44 minutes 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp, 0.0.0.0:9380->9380/tcp, :::9380->9380/tcp ragflow-server 2177a43be588 mysql:8.0.39 "docker-entrypoint.s…" 44 minutes ago Up 44 minutes (healthy) 33060/tcp, 0.0.0.0:5455->3306/tcp, :::5455->3306/tcp ragflow-mysql 54504f3cc315 quay.io/minio/minio:RELEASE.2023-12-20T01-00-02Z "/usr/bin/docker-ent…" 44 minutes ago Up 44 minutes 0.0.0.0:9000-9001->9000-9001/tcp, :::9000-9001->9000-9001/tcp ragflow-minio c280618e7efe elasticsearch:8.11.3 "/bin/tini -- /usr/l…" 44 minutes ago Up 43 minutes (healthy) 9300/tcp, 0.0.0.0:1200->9200/tcp, :::1200->9200/tcp ragflow-es-01 5aa76f5237d9 valkey/valkey:8 "docker-entrypoint.s…" 44 minutes ago Up 44 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp ragflow-redis
I have same problem , using latest vesion v0.17.2
- compose setting :
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ["4", "5", "6", "7"]
capabilities: [gpu]
- task_executor error:
2025-04-03 18:14:09,602 INFO 37 set_progress(5f9064f8107411f0be790242ac130006), progress: -1, progress_msg: 18:14:09 Page(1~13): [ERROR]Fail to bind embedding model: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:7
2025-04-03 18:14:09,602 ERROR 37 Fail to bind embedding model: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:7
Traceback (most recent call last):
File "/ragflow/rag/svr/task_executor.py", line 502, in do_handle_task
vts, _ = embedding_model.encode(["ok"])
File "<@beartype(api.db.services.llm_service.LLMBundle.encode) at 0x7f91945d7130>", line 31, in encode
File "/ragflow/api/db/services/llm_service.py", line 222, in encode
embeddings, used_tokens = self.mdl.encode(texts)
File "<@beartype(rag.llm.embedding_model.YoudaoEmbed.encode) at 0x7f91a1798a60>", line 31, in encode
File "/ragflow/rag/llm/embedding_model.py", line 368, in encode
embds = YoudaoEmbed._client.encode(texts[i:i + batch_size])
File "/ragflow/.venv/lib/python3.10/site-packages/BCEmbedding/models/embedding.py", line 94, in encode
outputs = self.model(**inputs_on_device, return_dict=True)
File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 177, in forward
raise RuntimeError(
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:7
same problem, updated with nightly version, still not work.
same problem with the nightly version
It works well after changing the version from 0.18.0 to 0.17.2
version: v0.20.4 full,This problem still exists。
2025-09-04 11:48:16,481 ERROR 33 handle_task got exception for task {"id": "f7954638894111f08d700242ac170006", "doc_id": "e8df8ca2894111f093580242ac170006", "from_page": 612, "to_page": 617, "retry_count": 0, "kb_id": "6b65315488a711f097ad0242ac120006", "parser_id": "manual", "parser_config": {"pages": [[1, 1000000]], "task_page_size": 12, "layout_recognize": "DeepDOC", "chunk_token_num": 512, "delimiter": "\n", "auto_keywords": 0, "auto_questions": 0, "html4excel": false, "raptor": {"use_raptor": false, "prompt": "\u8bf7\u603b\u7ed3\u4ee5\u4e0b\u6bb5\u843d\u3002 \u5c0f\u5fc3\u6570\u5b57\uff0c\u4e0d\u8981\u7f16\u9020\u3002 \u6bb5\u843d\u5982\u4e0b\uff1a\n {cluster_content}\n\u4ee5\u4e0a\u5c31\u662f\u4f60\u9700\u8981\u603b\u7ed3\u7684\u5185\u5bb9\u3002", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "graphrag": {}, "entity_types": []}, "name": "\u6574\u5408\u77e5\u8bc6\u5e93\uff08\u542b\u624b\u518c\uff09.pdf", "type": "pdf", "location": "\u6574\u5408\u77e5\u8bc6\u5e93\uff08\u542b\u624b\u518c\uff09.pdf", "size": 52332893, "tenant_id": "5dff8a8288a711f0a42f0242ac120006", "language": "English", "embd_id": "BAAI/bge-large-zh-v1.5@BAAI", "pagerank": 0, "kb_parser_config": {"pages": [[1, 1000000]]}, "img2txt_id": "qwen-vl-plus@Tongyi-Qianwen", "asr_id": "qwen-audio-asr@Tongyi-Qianwen", "llm_id": "qwen-plus@Tongyi-Qianwen", "update_time": 1756957686871, "task_type": ""} Traceback (most recent call last): File "/ragflow/rag/svr/task_executor.py", line 671, in handle_task await do_handle_task(task) File "/ragflow/api/utils/api_utils.py", line 693, in async_wrapper return await func(*args, **kwargs) File "/ragflow/rag/svr/task_executor.py", line 554, in do_handle_task vts, _ = embedding_model.encode(["ok"]) File "<@beartype(api.db.services.llm_service.LLMBundle.encode) at 0x7f891059e0e0>", line 31, in encode File "/ragflow/api/db/services/llm_service.py", line 99, in encode embeddings, used_tokens = self.mdl.encode(texts) File "<@beartype(rag.llm.embedding_model.DefaultEmbedding.encode) at 0x7f890c89f2e0>", line 31, in encode File "/ragflow/rag/llm/embedding_model.py", line 122, in encode ress = self._model.encode(texts[i : i + batch_size], convert_to_numpy=True) File "/ragflow/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/ragflow/.venv/lib/python3.10/site-packages/FlagEmbedding/flag_models.py", line 96, in encode last_hidden_state = self.model(**inputs, return_dict=True).last_hidden_state File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 178, in forward raise RuntimeError( RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:7