PromptTuning can not work with block_reuse

Open littletomatodonkey opened this issue 1 year ago • 1 comments

Hi, i found that when i use prompttuning, the block_reuse seems not work.

cuda version : 12.2 TRT-LLM version 0.9.0 deivice: A100 precision: FP16

For Yi-6B with 512 input tokens and 1 output tokens and batch size 32.

For model without prompt tuning
- disable block_reuse: 0.99iter/s
- enable block_reuse: 3.00iter/s
for model with prompt tuning
- disable block_reuse: 0.99iter/s
- enable block_reuse: 0.99iter/s

It seems they can not work simultaneously, could you please help to have a look? Thanks!

Jul 06 '24 12:07 littletomatodonkey

Yes, it's expected. The prompt tuning can not work with block_reuse now.

Jul 15 '24 08:07 QiJune