lmdeploy
lmdeploy copied to clipboard
[Bug] 0.8.0不支持Windows吗?
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
0.8.0在Windows上出现不报错但也无法启动的情况
Reproduction
@echo off
set CUDA_VISIBLE_DEVICES=1 cmd /k "conda activate lmdeploy && lmdeploy serve api_server E:\models\LLM\Qwen3-8B-AWQ --server-name 127.0.0.1 --server-port 8080 --model-name Laptop-Translation-model --backend turbomind --model-format awq --cache-max-entry-count 0.85 --session-len 2048 --max-concurrent-requests 16"
Environment
sys.platform: win32
Python: 3.12.9 | packaged by Anaconda, Inc. | (main, Feb 6 2025, 18:49:16) [MSC v.1929 64 bit (AMD64)]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: Tesla V100-SXM2-16GB
CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4
NVCC: Cuda compilation tools, release 12.4, V12.4.131
MSVC: 用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.43.34810 版
GCC: n/a
PyTorch: 2.6.0+cu124
PyTorch compiling details: PyTorch built with:
- C++ Version: 201703
- MSVC 192930157
- Intel(R) oneAPI Math Kernel Library Version 2025.0.1-Product Build 20241031 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)
- OpenMP 2019
- LAPACK is enabled (usually provided by MKL)
- CPU capability usage: AVX512
- CUDA Runtime 12.4
- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
- CuDNN 90.1
- Magma 2.5.4
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=2236df1770800ffea5697b11b0bb0d910b2e59e1, CUDA_VERSION=12.4, CUDNN_VERSION=9.1.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/pytorch/.ci/pytorch/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.6.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,
TorchVision: 0.21.0+cu124
LMDeploy: 0.8.0+
transformers: 4.51.3
gradio: Not Found
fastapi: 0.115.12
pydantic: 2.11.4
triton: Not Found
Error traceback
Add dll path C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin, please note cuda version should >= 11.3 when compiled with cuda 11
2025-05-05 21:25:21,937 - lmdeploy - INFO - async_engine.py:259 - input backend=turbomind, backend_config=TurbomindEngineConfig(dtype='auto', model_format='awq', tp=1, dp=1, device_num=None, attn_tp_size=None, attn_dp_size=None, mlp_tp_size=None, mlp_dp_size=None, outer_dp_size=None, session_len=2048, max_batch_size=128, cache_max_entry_count=0.85, cache_chunk_size=-1, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1, communicator='nccl')
2025-05-05 21:25:21,937 - lmdeploy - INFO - async_engine.py:260 - input chat_template_config=None
2025-05-05 21:25:21,944 - lmdeploy - INFO - async_engine.py:269 - updated chat_template_onfig=ChatTemplateConfig(model_name='qwen', system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, tool=None, eotool=None, separator=None, capability=None, stop_words=None)
2025-05-05 21:25:22,398 - lmdeploy - INFO - turbomind.py:312 - model_source: ModelSource.HF_MODEL
2025-05-05 21:25:22,488 - lmdeploy - INFO - turbomind.py:226 - turbomind model config:
{
"model_config": {
"model_name": "",
"chat_template": "",
"model_arch": "Qwen3ForCausalLM",
"head_num": 32,
"kv_head_num": 8,
"hidden_units": 4096,
"vocab_size": 151936,
"embedding_size": 151936,
"num_layer": 36,
"inter_size": [
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288,
12288
],
"norm_eps": 1e-06,
"attn_bias": 0,
"qk_norm": true,
"size_per_head": 128,
"group_size": 128,
"weight_type": "int4",
"session_len": 2048,
"attn_tp_size": 1,
"mlp_tp_size": 1,
"model_format": "awq",
"expert_num": [],
"expert_inter_size": 0,
"experts_per_token": 0,
"moe_shared_gate": false,
"norm_topk_prob": false,
"routed_scale": 1.0,
"topk_group": 1,
"topk_method": "greedy",
"moe_group_num": 1,
"q_lora_rank": 0,
"kv_lora_rank": 0,
"qk_rope_dim": 0,
"v_head_dim": 0,
"tune_layer_num": 1
},
"attention_config": {
"softmax_scale": 0.0,
"cache_block_seq_len": 64,
"use_logn_attn": 0,
"max_position_embeddings": 40960,
"rope_param": {
"type": "default",
"base": 1000000.0,
"dim": 128,
"factor": 1.0,
"max_position_embeddings": null,
"attention_factor": 1.0,
"beta_fast": 32,
"beta_slow": 1,
"low_freq_factor": null,
"high_freq_factor": null,
"original_max_position_embeddings": null
}
},
"lora_config": {
"lora_policy": "",
"lora_r": 0,
"lora_scale": 0.0,
"lora_max_wo_r": 0,
"lora_rank_pattern": "",
"lora_scale_pattern": ""
},
"engine_config": {
"dtype": "auto",
"model_format": "awq",
"tp": 1,
"dp": 1,
"device_num": 1,
"attn_tp_size": 1,
"attn_dp_size": 1,
"mlp_tp_size": 1,
"mlp_dp_size": 1,
"outer_dp_size": 1,
"session_len": 2048,
"max_batch_size": 128,
"cache_max_entry_count": 0.85,
"cache_chunk_size": -1,
"cache_block_seq_len": 64,
"enable_prefix_caching": false,
"quant_policy": 0,
"rope_scaling_factor": 0.0,
"use_logn_attn": false,
"download_dir": null,
"revision": null,
"max_prefill_token_num": 8192,
"num_tokens_per_iter": 8192,
"max_prefill_iters": 1,
"communicator": "nccl"
}
}
[TM][DEBUG] Set logger level by DEBUG
[TM][WARNING] [LlamaTritonModel] `max_context_token_num` is not set, default to 2048.
(lmdeploy) E:\models\LLM>
硬件识别不正确?
GPU 0: Tesla V100-SXM2-16GB
C:\Users\31940>nvidia-smi
Mon May 5 21:36:40 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.78 Driver Version: 551.78 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4060 ... WDDM | 00000000:01:00.0 On | N/A |
| N/A 49C P4 18W / 80W | 1842MiB / 8188MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla V100-SXM2-16GB WDDM | 00000000:08:00.0 Off | 0 |
| N/A 40C P0 24W / 300W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 4640 C+G ...n\NVIDIA App\CEF\NVIDIA Overlay.exe N/A |
| 0 N/A N/A 5868 C+G ...__8wekyb3d8bbwe\Notepad\Notepad.exe N/A |
| 0 N/A N/A 5892 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 5996 C+G ...ekyb3d8bbwe\PhoneExperienceHost.exe N/A |
| 0 N/A N/A 7828 C+G ...on\wallpaper_engine\wallpaper64.exe N/A |
| 0 N/A N/A 9592 C+G ...n\NVIDIA App\CEF\NVIDIA Overlay.exe N/A |
| 0 N/A N/A 9884 C+G ...x64__qmba6cd70vzyy\ArmouryCrate.exe N/A |
| 0 N/A N/A 10360 C+G C:\Windows\System32\ShellHost.exe N/A |
| 0 N/A N/A 10368 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 11364 C+G C:\Program Files\bilibili\哔哩哔哩.exe N/A |
| 0 N/A N/A 15068 C+G ...nt.CBS_cw5n1h2txyewy\SearchHost.exe N/A |
| 0 N/A N/A 15092 C+G ...2txyewy\StartMenuExperienceHost.exe N/A |
| 0 N/A N/A 17828 C+G ...s\System32\ApplicationFrameHost.exe N/A |
| 0 N/A N/A 17928 C+G ...siveControlPanel\SystemSettings.exe N/A |
| 0 N/A N/A 19136 C+G ...CBS_cw5n1h2txyewy\TextInputHost.exe N/A |
| 0 N/A N/A 22020 C+G ...\AMD\CNext\CNext\RadeonSoftware.exe N/A |
| 0 N/A N/A 22108 C+G ...5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 N/A N/A 23128 C+G ...les\AMD\CNext\CNext\AMDRSSrcExt.exe N/A |
| 0 N/A N/A 24596 C+G ...crosoft\Edge\Application\msedge.exe N/A |
| 0 N/A N/A 26612 C+G ...__8wekyb3d8bbwe\WindowsTerminal.exe N/A |
+-----------------------------------------------------------------------------------------+
use the version: 0.7.2