WeiHaocheng

Results 19 comments of WeiHaocheng

> @WeiHaocheng @dc3671 @thorjohnsen Hi Fred/Zhenhuan/Thor > > Can you help review this PR from the community? > > Thanks June Sure~Let me review it~

Looks like we can store the kv cache block only when the new block is generated and only store the new block. Let me talk with @narutolhy offline.

@thorjohnsen Hi Thor~ Could you help to review this PR?

> My one concern is that this might introduce significant CPU overhead. It looks like the last block of each generation request is stored in every iteration, so each block...