lmdeploy icon indicating copy to clipboard operation
lmdeploy copied to clipboard

[Bug] 使用lmdeploy部署模型internvl2.5/3系列模型,部署在24小时之内会断掉怎么回事?

Open lmingze opened this issue 4 months ago • 2 comments

Checklist

  • [ ] 1. I have searched related issues but cannot get the expected help.
  • [ ] 2. The bug has not been fixed in the latest version.
  • [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

[Bug] 使用lmdeploy部署模型internvl2.5/3系列模型,部署在24小时之内会断掉怎么回事?

Reproduction

CUDA_VISIBLE_DEVICES=2,4 lmdeploy serve api_server /data22/ljc/proj/ckpt/InternVL2_5-38B-MPO-AWQ --server-port 2556 --cache-max-entry-count 0.5 --tp 2 > internvl2.5_38B.log 2>&1 &

Environment

Package                           Version
--------------------------------- -------------
accelerate                        1.9.0
addict                            2.4.0
aiohappyeyeballs                  2.6.1
aiohttp                           3.12.14
aiosignal                         1.4.0
airportsdata                      20250706
annotated-types                   0.7.0
anyio                             4.9.0
astor                             0.8.1
attrs                             25.3.0
av                                15.0.0
beautifulsoup4                    4.13.4
blake3                            1.0.5
cachetools                        6.1.0
cbor                              1.0.0
cbor2                             5.6.5
certifi                           2025.7.14
cffi                              1.17.1
charset-normalizer                3.4.2
click                             8.2.1
cloudpickle                       3.1.1
compressed-tensors                0.10.2
cupy-cuda12x                      13.5.1
datasets                          4.0.0
depyf                             0.19.0
dill                              0.3.8
diskcache                         5.6.3
distro                            1.9.0
dnspython                         2.7.0
einops                            0.8.1
email_validator                   2.2.0
et_xmlfile                        2.0.0
fastapi                           0.116.1
fastapi-cli                       0.0.8
fastapi-cloud-cli                 0.1.4
fastrlock                         0.8.3
filelock                          3.18.0
fire                              0.7.0
FlagEmbedding                     1.3.5
frozenlist                        1.7.0
fsspec                            2025.3.0
genson                            1.3.0
gguf                              0.17.1
h11                               0.16.0
hf-xet                            1.1.5
httpcore                          1.0.9
httptools                         0.6.4
httpx                             0.28.1
huggingface-hub                   0.34.0
idna                              3.10
ijson                             3.4.0
inscriptis                        2.6.0
interegular                       0.3.3
ir_datasets                       0.5.11
iso3166                           2.1.1
Jinja2                            3.1.6
jiter                             0.10.0
joblib                            1.5.1
jsonpath-ng                       1.7.0
jsonschema                        4.25.0
jsonschema-specifications         2025.4.1
lark                              1.2.2
llguidance                        0.7.30
llvmlite                          0.44.0
lm-format-enforcer                0.10.11
lmdeploy                          0.9.2
lxml                              6.0.0
lz4                               4.4.4
markdown-it-py                    3.0.0
MarkupSafe                        3.0.2
mdurl                             0.1.2
mistral_common                    1.8.2
mmengine-lite                     0.10.7
modelscope                        1.28.1
mpmath                            1.3.0
msgpack                           1.1.1
msgspec                           0.19.0
multidict                         6.6.3
multiprocess                      0.70.16
nest-asyncio                      1.6.0
networkx                          3.5
ninja                             1.11.1.4
numba                             0.61.2
numpy                             1.26.4
nvidia-cublas-cu12                12.6.4.1
nvidia-cuda-cupti-cu12            12.6.80
nvidia-cuda-nvrtc-cu12            12.6.77
nvidia-cuda-runtime-cu12          12.6.77
nvidia-cudnn-cu12                 9.5.1.17
nvidia-cufft-cu12                 11.3.0.4
nvidia-cufile-cu12                1.11.1.6
nvidia-curand-cu12                10.3.7.77
nvidia-cusolver-cu12              11.7.1.2
nvidia-cusparse-cu12              12.5.4.2
nvidia-cusparselt-cu12            0.6.3
nvidia-ml-py                      12.575.51
nvidia-nccl-cu12                  2.26.2
nvidia-nvjitlink-cu12             12.6.85
nvidia-nvtx-cu12                  12.6.77
nvitop                            1.5.2
openai                            1.90.0
opencv-python-headless            4.12.0.88
openpyxl                          3.1.5
outlines                          1.1.1
outlines_core                     0.1.26
packaging                         25.0
pandas                            2.3.1
partial-json-parser               0.2.1.1.post6
peft                              0.14.0
pillow                            11.3.0
pip                               25.1.1
platformdirs                      4.3.8
ply                               3.11
prometheus_client                 0.22.1
prometheus-fastapi-instrumentator 7.1.0
propcache                         0.3.2
protobuf                          6.31.1
psutil                            7.0.0
py-cpuinfo                        9.0.0
pyarrow                           21.0.0
pybase64                          1.4.1
pycountry                         24.6.1
pycparser                         2.22
pydantic                          2.11.7
pydantic_core                     2.33.2
pydantic-extra-types              2.10.5
Pygments                          2.19.2
pynvml                            12.0.0
python-dateutil                   2.9.0.post0
python-dotenv                     1.1.1
python-json-logger                3.3.0
python-multipart                  0.0.20
pytz                              2025.2
PyYAML                            6.0.2
pyzmq                             27.0.0
qwen-vl-utils                     0.0.11
ray                               2.48.0
referencing                       0.36.2
regex                             2024.11.6
requests                          2.32.4
rich                              14.1.0
rich-toolkit                      0.14.8
rignore                           0.6.4
rpds-py                           0.26.0
safetensors                       0.5.3
scikit-learn                      1.7.1
scipy                             1.16.0
sentence-transformers             5.1.0
sentencepiece                     0.2.0
sentry-sdk                        2.33.2
setuptools                        79.0.1
shellingham                       1.5.4
shortuuid                         1.0.13
six                               1.17.0
sniffio                           1.3.1
soundfile                         0.13.1
soupsieve                         2.7
soxr                              0.5.0.post1
starlette                         0.47.2
sympy                             1.14.0
termcolor                         3.1.0
threadpoolctl                     3.6.0
tiktoken                          0.9.0
timm                              1.0.19
tokenizers                        0.21.2
torch                             2.7.1
torchaudio                        2.7.1
torchvision                       0.22.1
tqdm                              4.67.1
transformers                      4.53.3
trec-car-tools                    2.6
triton                            3.3.1
typer                             0.16.0
typing_extensions                 4.14.1
typing-inspection                 0.4.1
tzdata                            2025.2
unlzw3                            0.2.3
urllib3                           2.5.0
uvicorn                           0.35.0
uvloop                            0.21.0
vllm                              0.10.0
warc3-wet                         0.2.5
warc3-wet-clueweb09               0.2.5
watchfiles                        1.1.0
websockets                        15.0.1
wheel                             0.45.1
xformers                          0.0.31
xgrammar                          0.1.21
xxhash                            3.5.0
yapf                              0.43.0
yarl                              1.20.1
zlib-state                        0.1.9

Error traceback


lmingze avatar Aug 14 '25 03:08 lmingze

是服务端卡住了么? 针对 vlm 模型,如果开启 tp 模式,建议使用 pytorch engine,即 --backend pytorch。turbomind在这种case下,有卡住风险。我们还在想办法解决。

lvhan028 avatar Aug 15 '25 03:08 lvhan028

是服务端卡住了么? 针对 vlm 模型,如果开启 tp 模式,建议使用 pytorch engine,即 --backend pytorch。turbomind在这种case下,有卡住风险。我们还在想办法解决。

@lvhan028 请问turbomind会卡住是什么原因?

chengyuma avatar Aug 16 '25 13:08 chengyuma