💡 [REQUEST] - Support SGLang 支持SGLang 推理引擎

Open wizd opened this issue 1 year ago • 1 comments

起始日期 | Start Date

No response

实现PR | Implementation PR

No response

摘要 | Summary

SGLang 超过 vLLM 是最快的推理引擎

基本示例 | Basic Example

SGLang project repo: https://github.com/sgl-project/sglang

docker run -d --gpus all \
    -p 5010:5010 \
    --name sglang \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    -v ~/data/models:/models \
    --env "HF_TOKEN=hf_...j" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server --model-path /models/Llama3.1-8B-Chinese-Chat --host 0.0.0.0 --port 5010 --quantization fp8 --context-length 64000

缺陷 | Drawbacks

vision_config is None, using default vision config
Initialization failed. controller_init_state: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/controller_single.py", line 150, in start_controller_process
    controller = ControllerSingle(
  File "/sgl-workspace/sglang/python/sglang/srt/managers/controller_single.py", line 84, in __init__
    self.tp_server = ModelTpServer(
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 92, in __init__
    self.model_runner = ModelRunner(
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 130, in __init__
    self.load_model()
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 181, in load_model
    self.model = get_model(
  File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/model_loader/__init__.py", line 21, in get_model
    return loader.load_model(model_config=model_config,
  File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/model_loader/loader.py", line 280, in load_model
    model = _initialize_model(model_config, self.load_config,
  File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/model_loader/loader.py", line 108, in _initialize_model
    model_class = get_model_architecture(model_config)[0]
  File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/model_loader/utils.py", line 32, in get_model_architecture
    model_cls = ModelRegistry.load_model_cls(arch)
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 456, in load_model_cls_srt
    raise ValueError(
ValueError: Unsupported architectures: MiniCPMV. Supported list: ['ChatGLMForCausalLM', 'ChatGLMModel', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPTBigCodeForCausalLM', 'Grok1ModelForCausalLM', 'InternLM2ForCausalLM', 'LlamaForCausalLM', 'LlamaForClassification', 'LlavaLlamaForCausalLM', 'LlavaQwenForCausalLM', 'LlavaMistralForCausalLM', 'LlavaVidForCausalLM', 'MiniCPMForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'QWenLMHeadModel', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'StableLmForCausalLM', 'YiVLForCausalLM']

Initialization failed. detoken_init_state: init ok

未解决问题 | Unresolved questions

No response

Aug 06 '24 17:08 wizd

Hello, thank you for following our work, we will consider trying to support it in the future!

Aug 07 '24 08:08 Cuiunbo

💡 [REQUEST] - Support SGLang 支持SGLang 推理引擎

起始日期 | Start Date

实现PR | Implementation PR

相关Issues | Reference Issues

摘要 | Summary

基本示例 | Basic Example

缺陷 | Drawbacks

未解决问题 | Unresolved questions