[Bug]: 执行sglang.launch_server 出现KeyError "rope_type"

Open JV-X opened this issue 1 year ago • 0 comments

Is there an existing issue ? / 是否已有相关的 issue ?

[X] I have searched, and there is no existing issue. / 我已经搜索过了，没有相关的 issue。

Describe the bug / 描述这个 bug

我根据 sgl的官方文档：https://sgl-project.github.io/start/install.html 用pip的方式安装了sglang包，然后根据MiniCPM的README文档执行python -m sglang.launch_server --model openbmb/MiniCPM3-4B --trust-remote-code --port 30000 --chat-template chatml,执行后出现错误信息如下：

(sgl) hygx@hygx:~$ python -m sglang.launch_server --model openbmb/MiniCPM3-4B --trust-remote-code --port 30000 --chat-template chatml
[2024-12-10 09:17:10] server_args=ServerArgs(model_path='openbmb/MiniCPM3-4B', tokenizer_path='openbmb/MiniCPM3-4B', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization=None, context_length=None, device='cuda', served_model_name='openbmb/MiniCPM3-4B', chat_template='chatml', is_embedding=False, revision=None, host='127.0.0.1', port=30000, mem_fraction_static=0.88, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=2048, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, tp_size=1, stream_interval=1, random_seed=186360464, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='SGLang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, lora_paths=None, max_loras_per_batch=8, attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=8, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, num_continuous_decode_steps=1, delete_ckpt_after_loading=False)
config.json: 100%|█████████████████████████████████████████████████████████████████| 1.93k/1.93k [00:00<00:00, 19.5MB/s]
configuration_minicpm.py: 100%|████████████████████████████████████████████████████| 9.23k/9.23k [00:00<00:00, 60.0MB/s]
A new version of the following files was downloaded from https://huggingface.co/openbmb/MiniCPM3-4B:
- configuration_minicpm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
tokenizer_config.json: 100%|███████████████████████████████████████████████████████| 10.4k/10.4k [00:00<00:00, 82.3MB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████████| 1.18M/1.18M [00:00<00:00, 8.66MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████| 3.68M/3.68M [00:01<00:00, 2.77MB/s]
added_tokens.json: 100%|███████████████████████████████████████████████████████████████| 216/216 [00:00<00:00, 2.14MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████| 1.63k/1.63k [00:00<00:00, 5.12MB/s]
[2024-12-10 09:17:17] Use chat template for the OpenAI-compatible API server: chatml
[2024-12-10 09:17:18 TP0] MLA optimization is turned on. Use triton backend.
[2024-12-10 09:17:18 TP0] Init torch distributed begin.
[2024-12-10 09:17:18 TP0] Load weight begin. avail mem=22.50 GB
[2024-12-10 09:17:18 TP0] Scheduler hit an exception: Traceback (most recent call last):
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1493, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 191, in __init__
    self.tp_worker = TpWorkerClass(
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 62, in __init__
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 62, in __init__
    self.model_runner = ModelRunner(
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 155, in __init__
    self.load_model()
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 253, in load_model
    self.model = get_model(
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/model_loader/__init__.py", line 22, in get_model
    return loader.load_model(
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/model_loader/loader.py", line 357, in load_model
    model = _initialize_model(
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/model_loader/loader.py", line 138, in _initialize_model
    return model_class(
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/models/minicpm3.py", line 551, in __init__
    self.model = MiniCPM3Model(config, quant_config=quant_config)
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/models/minicpm3.py", line 508, in __init__
    [
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/models/minicpm3.py", line 509, in <listcomp>
    MiniCPM3DecoderLayer(config, i, quant_config=quant_config)
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/models/minicpm3.py", line 416, in __init__
    self.self_attn = MiniCPM3AttentionMLA(
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/models/minicpm3.py", line 313, in __init__
    self.rotary_emb = get_rope(
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 978, in get_rope
    scaling_type = rope_scaling["rope_type"]
KeyError: 'rope_type'

Killed

请问需要我做哪些调整以让程序正常运行？

To Reproduce / 如何复现

conda create -n sgl python==3.10 conda activate sgl python -m pip install --upgrade pip python -m pip install "sglang[all]" --find-links https://flashinfer.ai/whl/cu121/torch2.4/flashinfer/ source switch_cuda.sh 11.6 python -m sglang.launch_server --model openbmb/MiniCPM3-4B --trust-remote-code --port 30000 --chat-template chatml

Expected behavior / 期望的结果

模型能在我的本地电脑上运行起来

Screenshots / 截图

No response

Environment / 环境

- OS: windows11上的WSL2 
- Pytorch: 2.5.1+cu124 
- CUDA:11.6 
- Device: i9-14900KF + RTX 4090 D

Additional context / 其他信息

No response

Dec 10 '24 01:12 JV-X