vllm
vllm copied to clipboard
[New Model]: Support Phi-3
The model to consider.
https://huggingface.co/microsoft/Phi-3-mini-128k-instruct https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
The closest model vllm already supports.
Phi-2 (which uses the same transformers model as Phi-1)
What's your difficulty of supporting the model you want?
Support for LongRope #3575
I tried running Phi-3-mini-128k-instruct but got this error:
langbench-vllm-1 | Traceback (most recent call last):
langbench-vllm-1 | File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
langbench-vllm-1 | return _run_code(code, main_globals, None,
langbench-vllm-1 | File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
langbench-vllm-1 | exec(code, run_globals)
langbench-vllm-1 | File "/workspace/vllm/entrypoints/openai/api_server.py", line 157, in <module>
langbench-vllm-1 | engine = AsyncLLMEngine.from_engine_args(
langbench-vllm-1 | File "/workspace/vllm/engine/async_llm_engine.py", line 331, in from_engine_args
langbench-vllm-1 | engine_config = engine_args.create_engine_config()
langbench-vllm-1 | File "/workspace/vllm/engine/arg_utils.py", line 406, in create_engine_config
langbench-vllm-1 | model_config = ModelConfig(
langbench-vllm-1 | File "/workspace/vllm/config.py", line 125, in __init__
langbench-vllm-1 | self.max_model_len = _get_and_verify_max_len(self.hf_text_config,
langbench-vllm-1 | File "/workspace/vllm/config.py", line 969, in _get_and_verify_max_len
langbench-vllm-1 | assert "factor" in rope_scaling
langbench-vllm-1 | AssertionError
because the relevant part of Phi-3's config.json is different to support LongRope
"rope_scaling": {
"long_factor": [
1.0299999713897705,
1.0499999523162842,
1.0499999523162842,
1.0799999237060547,
1.2299998998641968,
1.2299998998641968,
<truncated>
],
"short_factor": [
1.05,
1.05,
1.05,
1.1,
1.1,
1.1500000000000001,
1.2000000000000002,
1.2500000000000002,
<truncated>
],
"type": "longrope"
},
There may be other changes in the new modeling code that vLLM needs to support.
Phi-3 support pending in #4298
I think this issue can be closed now that #4298 has been merged.
Phi-3 small and medium seem to be working, but not mini. Phi-3-mini uses "longrope" as rope_scaling type (https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/config.json#L128), while Phi-3-small and Phi-3-medium use "su" (https://huggingface.co/microsoft/Phi-3-small-128k-instruct/blob/main/config.json#L180). Not too familiar with these types, however, this currently throws and error (vllm 0.5.0.post1).
works fine in v0.5.1
works fine in v0.5.1
I'm actually still getting this error with Phi-3 mini:
_phi3.py", line 185, in _rope_scaling_validation
raise ValueError(f"`rope_scaling`'s type field must be one of ['su', 'yarn'], got {rope_scaling_type}")
ValueError: `rope_scaling`'s type field must be one of ['su', 'yarn'], got longrope
And I'm on 0.5.1:
$ pdm show vllm
Name: vllm
Latest version: 0.5.1
Latest stable version: 0.5.1
Installed version: 0.5.1
Summary: A high-throughput and memory-efficient inference and serving engine for LLMs
Requires Python: >=3.8
Author: vLLM Team
Author email:
License: Apache 2.0
Homepage: https://github.com/vllm-project/vllm
Project URLs: Homepage: https://github.com/vllm-project/vllm
Documentation: https://vllm.readthedocs.io/en/latest/
Platform:
Keywords:
Edit: I get this with Phi-3-small too, not just mini.