sglang
sglang copied to clipboard
compatibility issues and memory leak problems --enable-flashinfer
Version: sglang==0.1.14 Hardware: ec2 g5.xlarge
Hi, when using the following line:
python sglang.launch_server --model-path openchat/openchat-3.5-0106 --port 30000 --mem-fraction-static 0.8 --enable-flashinfer
So, I notice two problems when running the above:
- When using
--enable-flashinferthe gemma script is invoked for some reason (I believe openchat is a finetuned version of mistral). When not using--enable-flashinferthe server starts up and works as expected. - the gemma script imports from
vllm.model_executor.input_metadata. input_metadata.py which was removed in vllm 0.4.0
Downgrading the vllm version to 0.3.3 gets the server up and running, but then a KV pool cache leak occurs, which I see was mentioned here #236 . This may be a 3rd issue, but I am unsure whether the issue will persist after 1. has been fixed.
Apologies for not posting the error message, but you should be able to reproduce the bug fairly easily.
I see 2. was fixed with https://github.com/sgl-project/sglang/commit/b0890631a011be28d5ef5a0b4d5551fdeb94ab25
Does this mean the problem with 1. is fixed @merrymercy?
1 is not a bug, because we need to import all models. We will work on the fixing the compatibility issues and memory leak problems.
This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.