lmdeploy [Bug] triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 108672, Hardware limit: 101376. Reducing block sizes or `num

[Bug] triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 108672, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.

Open EvoNexusX opened this issue 5 months ago • 3 comments

Checklist

[X] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.
[X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

你好，我尝试通过lmdeploy部署DeepSeek-Coder-V2-Lite-Instruct，报错如下：triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 108672, Hardware limit: 101376. Reducing block sizes or num_stages may help. 我的电脑是A6000 48G,我想部署16B的模型应该是可以的。代码如下：

            backend_config = PytorchEngineConfig(tp=1, block_size=32)
            LLM = pipeline(self.MODEL_PATH,backend_config=backend_config)

我尝试减少了block_size，也尝试更换成了 turbomind backend，也尝试修改了cache_max_entry_count但这并没有帮助到我。

Reproduction

Environment

Error traceback

Sep 12 '24 00:09 EvoNexusX

lmdeploy lmdeploy copied to clipboard

[Bug] triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 108672, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.

Checklist

Describe the bug

Reproduction

Environment

Error traceback

lmdeploy
lmdeploy copied to clipboard