sglang icon indicating copy to clipboard operation
sglang copied to clipboard

compatibility issues and memory leak problems --enable-flashinfer

Open pj-ml opened this issue 10 months ago • 3 comments

Version: sglang==0.1.14 Hardware: ec2 g5.xlarge

Hi, when using the following line:

python sglang.launch_server --model-path openchat/openchat-3.5-0106 --port 30000 --mem-fraction-static 0.8 --enable-flashinfer

So, I notice two problems when running the above:

  1. When using --enable-flashinfer the gemma script is invoked for some reason (I believe openchat is a finetuned version of mistral). When not using --enable-flashinfer the server starts up and works as expected.
  2. the gemma script imports from vllm.model_executor.input_metadata. input_metadata.py which was removed in vllm 0.4.0

Downgrading the vllm version to 0.3.3 gets the server up and running, but then a KV pool cache leak occurs, which I see was mentioned here #236 . This may be a 3rd issue, but I am unsure whether the issue will persist after 1. has been fixed.

Apologies for not posting the error message, but you should be able to reproduce the bug fairly easily.

pj-ml avatar Apr 09 '24 12:04 pj-ml

I see 2. was fixed with https://github.com/sgl-project/sglang/commit/b0890631a011be28d5ef5a0b4d5551fdeb94ab25

pj-ml avatar Apr 09 '24 14:04 pj-ml

Does this mean the problem with 1. is fixed @merrymercy?

pj-ml avatar Apr 16 '24 17:04 pj-ml

1 is not a bug, because we need to import all models. We will work on the fixing the compatibility issues and memory leak problems.

merrymercy avatar Apr 17 '24 16:04 merrymercy