vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[New Model]: Support Phi-3

Open alexkreidler opened this issue 1 year ago • 2 comments

The model to consider.

https://huggingface.co/microsoft/Phi-3-mini-128k-instruct https://huggingface.co/microsoft/Phi-3-mini-4k-instruct

The closest model vllm already supports.

Phi-2 (which uses the same transformers model as Phi-1)

What's your difficulty of supporting the model you want?

Support for LongRope #3575

I tried running Phi-3-mini-128k-instruct but got this error:

langbench-vllm-1  | Traceback (most recent call last):
langbench-vllm-1  |   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
langbench-vllm-1  |     return _run_code(code, main_globals, None,
langbench-vllm-1  |   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
langbench-vllm-1  |     exec(code, run_globals)
langbench-vllm-1  |   File "/workspace/vllm/entrypoints/openai/api_server.py", line 157, in <module>
langbench-vllm-1  |     engine = AsyncLLMEngine.from_engine_args(
langbench-vllm-1  |   File "/workspace/vllm/engine/async_llm_engine.py", line 331, in from_engine_args
langbench-vllm-1  |     engine_config = engine_args.create_engine_config()
langbench-vllm-1  |   File "/workspace/vllm/engine/arg_utils.py", line 406, in create_engine_config
langbench-vllm-1  |     model_config = ModelConfig(
langbench-vllm-1  |   File "/workspace/vllm/config.py", line 125, in __init__
langbench-vllm-1  |     self.max_model_len = _get_and_verify_max_len(self.hf_text_config,
langbench-vllm-1  |   File "/workspace/vllm/config.py", line 969, in _get_and_verify_max_len
langbench-vllm-1  |     assert "factor" in rope_scaling
langbench-vllm-1  | AssertionError

because the relevant part of Phi-3's config.json is different to support LongRope

"rope_scaling": {
    "long_factor": [
      1.0299999713897705,
      1.0499999523162842,
      1.0499999523162842,
      1.0799999237060547,
      1.2299998998641968,
      1.2299998998641968,
      <truncated>
    ],
    "short_factor": [
      1.05,
      1.05,
      1.05,
      1.1,
      1.1,
      1.1500000000000001,
      1.2000000000000002,
      1.2500000000000002,
      <truncated>
    ],
    "type": "longrope"
  },

There may be other changes in the new modeling code that vLLM needs to support.

alexkreidler avatar Apr 23 '24 21:04 alexkreidler

Phi-3 support pending in #4298

agt avatar Apr 23 '24 22:04 agt

I think this issue can be closed now that #4298 has been merged.

DarkLight1337 avatar May 16 '24 14:05 DarkLight1337

Phi-3 small and medium seem to be working, but not mini. Phi-3-mini uses "longrope" as rope_scaling type (https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/config.json#L128), while Phi-3-small and Phi-3-medium use "su" (https://huggingface.co/microsoft/Phi-3-small-128k-instruct/blob/main/config.json#L180). Not too familiar with these types, however, this currently throws and error (vllm 0.5.0.post1).

timbmg avatar Jul 04 '24 15:07 timbmg

works fine in v0.5.1

sanjay920 avatar Jul 09 '24 06:07 sanjay920

works fine in v0.5.1

I'm actually still getting this error with Phi-3 mini:

   _phi3.py", line 185, in _rope_scaling_validation                                                                       
      raise ValueError(f"`rope_scaling`'s type field must be one of ['su', 'yarn'], got {rope_scaling_type}")            
      ValueError: `rope_scaling`'s type field must be one of ['su', 'yarn'], got longrope     

And I'm on 0.5.1:

$ pdm show vllm
Name:                  vllm                                                                        
Latest version:        0.5.1                                                                       
Latest stable version: 0.5.1                                                                       
Installed version:     0.5.1                                                                       
Summary:               A high-throughput and memory-efficient inference and serving engine for LLMs
Requires Python:       >=3.8                                                                       
Author:                vLLM Team                                                                   
Author email:                                                                                      
License:               Apache 2.0                                                                  
Homepage:              https://github.com/vllm-project/vllm                                        
Project URLs:          Homepage: https://github.com/vllm-project/vllm                              
                       Documentation: https://vllm.readthedocs.io/en/latest/                       
Platform:                                                                                          
Keywords:                                                                                 

Edit: I get this with Phi-3-small too, not just mini.

baughmann avatar Jul 11 '24 03:07 baughmann