sglang
sglang copied to clipboard
compatibility issues and memory leak problems --enable-flashinfer
Version: sglang==0.1.14 Hardware: ec2 g5.xlarge
Hi, when using the following line:
python sglang.launch_server --model-path openchat/openchat-3.5-0106 --port 30000 --mem-fraction-static 0.8 --enable-flashinfer
So, I notice two problems when running the above:
- When using
--enable-flashinfer
the gemma script is invoked for some reason (I believe openchat is a finetuned version of mistral). When not using--enable-flashinfer
the server starts up and works as expected. - the gemma script imports from
vllm.model_executor.input_metadata
. input_metadata.py which was removed in vllm 0.4.0
Downgrading the vllm version to 0.3.3 gets the server up and running, but then a KV pool cache leak occurs, which I see was mentioned here #236 . This may be a 3rd issue, but I am unsure whether the issue will persist after 1. has been fixed.
Apologies for not posting the error message, but you should be able to reproduce the bug fairly easily.
I see 2. was fixed with https://github.com/sgl-project/sglang/commit/b0890631a011be28d5ef5a0b4d5551fdeb94ab25
Does this mean the problem with 1. is fixed @merrymercy?
1 is not a bug, because we need to import all models. We will work on the fixing the compatibility issues and memory leak problems.