Meng Zhu
Results
2
issues of
Meng Zhu
## TL;DR In V1, swap GPU KV cache blocks to CPU upon eviction and swap them back if there's a cache hit. ## Swap Strategy CPU → GPU swap-in happens...
v1
### Motivation. Offloading device KV cache to the CPU can be worthwhile if the transfer overhead outweighs the re-computation, saving precious GPU cycles. This is especially useful in cases such...
RFC