vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Feature]: Chunked prefill + lora

Open rkooo567 opened this issue 1 year ago • 12 comments

🚀 The feature, motivation and pitch

Currently lora doesn't work with chunked prefill because some of lora index logic doesn't cover the case where sampling is not required. This also means lora is not working with sampling_params do_sample=True.

We need to add test cases for these. WIP https://github.com/vllm-project/vllm/pull/4994

Alternatives

No response

Additional context

No response

rkooo567 avatar May 23 '24 01:05 rkooo567

@rkooo567 can you share an example to reproduce this issue?

rohithkrn avatar Jun 18 '24 18:06 rohithkrn

I think you can simply create a test case by adding chunked prefill to any lora correctness test!

rkooo567 avatar Jun 19 '24 01:06 rkooo567

https://github.com/vllm-project/vllm/tree/main/tests/lora

rkooo567 avatar Jun 19 '24 01:06 rkooo567

@rkooo567 actually, when I run tests/lora/test_llama.py it passed. However, when I run examples/multilora_inference.py with chunked prefill the results are not matching results without chunked prefill. So, want to make sure we are talking about the same issue, I am trying to look into this on my side as well.

rohithkrn avatar Jun 19 '24 20:06 rohithkrn

@rkooo567 also are you seeing garbage output or an error?

rohithkrn avatar Jun 20 '24 00:06 rohithkrn

you mean This also means lora is not working with sampling_params do_sample=False.? @rkooo567

sfc-gh-zhwang avatar Sep 18 '24 06:09 sfc-gh-zhwang

@rkooo567 actually, when I run tests/lora/test_llama.py it passed. However, when I run examples/multilora_inference.py with chunked prefill the results are not matching results without chunked prefill. So, want to make sure we are talking about the same issue, I am trying to look into this on my side as well.

Hi, I just this. I think the loral + chunked prefill now is basically broken because lora assumes some index mapping that only works with default scheduling policy. I think the side effect could be wrong output or crash

rkooo567 avatar Sep 18 '24 16:09 rkooo567

you mean This also means lora is not working with sampling_params do_sample=False.? @rkooo567

Yes! that's right

rkooo567 avatar Sep 18 '24 16:09 rkooo567

Will chunked prefill ever work with LORA?

Nero10578 avatar Nov 12 '24 23:11 Nero10578

see https://github.com/vllm-project/vllm/pull/9057 @Nero10578

sfc-gh-zhwang avatar Nov 12 '24 23:11 sfc-gh-zhwang

@sfc-gh-zhwang is there any plan when this feature can be merged?

mces89 avatar Dec 03 '24 00:12 mces89

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions[bot] avatar Mar 03 '25 02:03 github-actions[bot]

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

github-actions[bot] avatar Apr 02 '25 02:04 github-actions[bot]