lmdeploy icon indicating copy to clipboard operation
lmdeploy copied to clipboard

[Bug] triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 108672, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.

Open EvoNexusX opened this issue 1 year ago • 5 comments
trafficstars

Checklist

  • [X] 1. I have searched related issues but cannot get the expected help.
  • [X] 2. The bug has not been fixed in the latest version.
  • [X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

你好,我尝试通过lmdeploy部署DeepSeek-Coder-V2-Lite-Instruct,报错如下:triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 108672, Hardware limit: 101376. Reducing block sizes or num_stages may help. 我的电脑是A6000 48G,我想部署16B的模型应该是可以的。 代码如下:

            backend_config = PytorchEngineConfig(tp=1, block_size=32)
            LLM = pipeline(self.MODEL_PATH,backend_config=backend_config)

我尝试减少了block_size,也尝试更换成了 turbomind backend,也尝试修改了cache_max_entry_count但这并没有帮助到我。

Reproduction

1

Environment

1

Error traceback

1

EvoNexusX avatar Sep 12 '24 00:09 EvoNexusX

我在vllm上能正常部署

EvoNexusX avatar Sep 12 '24 00:09 EvoNexusX

https://github.com/InternLM/lmdeploy/blob/edcdd8e36520b8bf7dbc99feecd2d2822c4cb5ba/lmdeploy/pytorch/kernels/cuda/pagedattention.py#L35

https://github.com/InternLM/lmdeploy/blob/edcdd8e36520b8bf7dbc99feecd2d2822c4cb5ba/lmdeploy/pytorch/kernels/cuda/pagedattention.py#L583 可以试试看把这些地方的 num_stages 改成 1

grimoire avatar Sep 12 '24 05:09 grimoire

我尝试进行了修改,但很遗憾, 这不奏效

EvoNexusX avatar Sep 14 '24 01:09 EvoNexusX

I also have this problem

nowayhere1 avatar Mar 05 '25 06:03 nowayhere1

lmdeploy/lmdeploy/pytorch/内核/cuda/pagedattention.py

第 35 行 在EDCDD8E

海卫一。config({}, num_stages=2, num_warps=16), lmdeploy/lmdeploy/pytorch/内核/cuda/pagedattention.py

583 行 在EDCDD8E

num_stages = 2

可以试试看把这些地方的 num_stages 改成 1

用main分支最新版部署qwen2.5-vl-72b-awq,遇到同样的问题

ColorfulDick avatar Apr 18 '25 13:04 ColorfulDick

同样的问题

zoulee24 avatar Jun 19 '25 01:06 zoulee24

L20 45G 部署qwen2vl-2B 同样的问题

Kai-dev7 avatar Sep 02 '25 02:09 Kai-dev7