vllm [Feature]: Chunked prefill + lora

🚀 The feature, motivation and pitch

Currently lora doesn't work with chunked prefill because some of lora index logic doesn't cover the case where sampling is not required. This also means lora is not working with sampling_params do_sample=True.

We need to add test cases for these. WIP https://github.com/vllm-project/vllm/pull/4994

Alternatives

No response

Additional context

No response

May 23 '24 01:05 rkooo567

@rkooo567 can you share an example to reproduce this issue?

Jun 18 '24 18:06 rohithkrn

I think you can simply create a test case by adding chunked prefill to any lora correctness test!

Jun 19 '24 01:06 rkooo567

https://github.com/vllm-project/vllm/tree/main/tests/lora

Jun 19 '24 01:06 rkooo567

@rkooo567 actually, when I run tests/lora/test_llama.py it passed. However, when I run examples/multilora_inference.py with chunked prefill the results are not matching results without chunked prefill. So, want to make sure we are talking about the same issue, I am trying to look into this on my side as well.

Jun 19 '24 20:06 rohithkrn

@rkooo567 also are you seeing garbage output or an error?

Jun 20 '24 00:06 rohithkrn

you mean This also means lora is not working with sampling_params do_sample=False.? @rkooo567

Sep 18 '24 06:09 sfc-gh-zhwang

@rkooo567 actually, when I run tests/lora/test_llama.py it passed. However, when I run examples/multilora_inference.py with chunked prefill the results are not matching results without chunked prefill. So, want to make sure we are talking about the same issue, I am trying to look into this on my side as well.

Hi, I just this. I think the loral + chunked prefill now is basically broken because lora assumes some index mapping that only works with default scheduling policy. I think the side effect could be wrong output or crash

Sep 18 '24 16:09 rkooo567

you mean This also means lora is not working with sampling_params do_sample=False.? @rkooo567

Yes! that's right

Sep 18 '24 16:09 rkooo567

Will chunked prefill ever work with LORA?

Nov 12 '24 23:11 Nero10578

see https://github.com/vllm-project/vllm/pull/9057 @Nero10578

Nov 12 '24 23:11 sfc-gh-zhwang

@sfc-gh-zhwang is there any plan when this feature can be merged?

Dec 03 '24 00:12 mces89

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

Mar 03 '25 02:03 github-actions[bot]

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

Apr 02 '25 02:04 github-actions[bot]