sglang compatibility issues and memory leak problems --enable-flashinfer

compatibility issues and memory leak problems --enable-flashinfer

Open pj-ml opened this issue 10 months ago • 3 comments

Version: sglang==0.1.14 Hardware: ec2 g5.xlarge

Hi, when using the following line:

python sglang.launch_server --model-path openchat/openchat-3.5-0106 --port 30000 --mem-fraction-static 0.8 --enable-flashinfer

So, I notice two problems when running the above:

When using --enable-flashinfer the gemma script is invoked for some reason (I believe openchat is a finetuned version of mistral). When not using --enable-flashinfer the server starts up and works as expected.
the gemma script imports from vllm.model_executor.input_metadata. input_metadata.py which was removed in vllm 0.4.0

Downgrading the vllm version to 0.3.3 gets the server up and running, but then a KV pool cache leak occurs, which I see was mentioned here #236 . This may be a 3rd issue, but I am unsure whether the issue will persist after 1. has been fixed.

Apologies for not posting the error message, but you should be able to reproduce the bug fairly easily.

Apr 09 '24 12:04 pj-ml

I see 2. was fixed with https://github.com/sgl-project/sglang/commit/b0890631a011be28d5ef5a0b4d5551fdeb94ab25

Apr 09 '24 14:04 pj-ml

Does this mean the problem with 1. is fixed @merrymercy?

Apr 16 '24 17:04 pj-ml

1 is not a bug, because we need to import all models. We will work on the fixing the compatibility issues and memory leak problems.

Apr 17 '24 16:04 merrymercy

sglang sglang copied to clipboard

compatibility issues and memory leak problems --enable-flashinfer

sglang
sglang copied to clipboard