WeiHaocheng
WeiHaocheng
> @WeiHaocheng @dc3671 @thorjohnsen Hi Fred/Zhenhuan/Thor > > Can you help review this PR from the community? > > Thanks June Sure~Let me review it~
Looks like we can store the kv cache block only when the new block is generated and only store the new block. Let me talk with @narutolhy offline.
@thorjohnsen Hi Thor~ Could you help to review this PR?
> My one concern is that this might introduce significant CPU overhead. It looks like the last block of each generation request is stored in every iteration, so each block...
/bot reuse-pipeline