Ruoyu Qin

Results 7 comments of Ruoyu Qin

> The cache hit rate in the graph provided in the paper refers to the entire system. For example, in the case of GPU+CPU+Mooncake, the number of token hits in...

Hi @Asher-XunZhang. Thank you for your comprehensive proposal! I understand that building the Conductor is a large-scale effort, but also a highly meaningful one. I'd like to ask a few...

> > Hi [@Asher-XunZhang](https://github.com/Asher-XunZhang). Thank you for your comprehensive proposal! I understand that building the Conductor is a large-scale effort, but also a highly meaningful one. I'd like to ask...

> > There are lots of protential algorithms for the Global Scheduler, the most important thing is providing a way to lookup the cache (i.e. `query_global_prefix_tree`), I suggest Mooncake provides...

@Boreas618 Hi, all the traces are from non-reasoning LLMs :)

> strangely, I don't see significant speedup from myside. maybe it is because I use a machine with 2TB memory, and the OS is smart enough to cache the disk...

> @chestnut-Q thanks for the report. what's your disk io speed and cpu memory? 1TB cpu memory and ~5GB/s disk io