Meng Zhu
Meng Zhu
@WoosukKwon as discussed offline, created RFC #16144
Thanks for the comments @comaniac @heheda12345 ! Re: LLM cache. Yeah, I think a lightweight solution where user can just flip a flag to use the CPU memory without any...
> the eviction algorithm for CPU blocks currently uses FIFO, which is basically unusable in production, this can be consistent with GPU blocks add some metrics for CPU offloading Glad...
> Hi [@mengzhu28](https://github.com/mengzhu28), thanks for working on this. I am curious if you have plan to finalize the change and the associated PR anytime soon? > > Besides, regarding having...