lmdeploy
lmdeploy copied to clipboard
[Bug] 2卡internvl2-26b推理,卡间通信是pcie会失败,nvlink会成功,这是为啥
Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
代码; backend_config = TurbomindEngineConfig(session_len=8192, tp=2, max_batch_size=2) pipe = pipeline(model, backend_config=backend_config) 报错; RuntimeError: [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/allocator.h:246
Reproduction
python run.py
Environment
A800 80g
2卡
Error traceback
x