kvcached
kvcached copied to clipboard
[TODO] Support kvcached offloading to other storage like CPU memory
When the GPU memory is almost full, kvcached can support offloading KV cache to CPU memory or even disks. Do this using CUDA UVM or more application semantics?