Yifan Qiao comments

Results 20 comments of


                                            Yifan Qiao

[TODO] Ollama integeration

A comment on Reddit mentioned that Ollama and LM Studio are actually based on llama.cpp with minor modifications. So maybe we can look into the delta there and hopefully the...

Using kvcached + sglang + qwen-fp8 directly causes an out-of-bounds error. [bug]

Thanks for the issue. I edited the log format a little bit for better presentation. This could be related to FP8 data type. Please use BF16 or FP16 while we...

Using kvcached + sglang + qwen-fp8 directly causes an out-of-bounds error. [bug]

Thanks for digging into the code! We totally agree that quantization is a must. We'd love to collaborate if you are interested in helping with the integration. Please feel free...

No such file or directory: '/dev/shm/VLLM'

Thanks @alecngo! I would suggest trying different IPC_NAMEs for different engines. `kvcached_mem_info` is the default IPC name and could get conflicts. This could be because kvcached assumes co-running engines must...

start mutiple models

Oh I think our version detection does not cover 0.8.5.post1 but that could be quickly fixed. Will update shortly

start mutiple models

Thank you @inforly for the issue! This should be fixed by #194. We'd appreciate it if you could give it another try. The fix is currently on the main branch,...

start mutiple models

Great to see it runs! I assume you set a static gpu_mem_utilization (e.g., 0.4 for each model) when disabling kvcached. If so, similar performance is expected, because the main goal...

Congrats on the launch!

Thank you @robertgshaw2-redhat! Have followed up over email and look forward to exploring potential integration together.

available_size()=9.0 < need_size=30

Thanks for the issue! This seems like kvcached thought there is available memory at first but found memory becomes insufficient in actual allocation. To help debug, could you let us...

available_size()=9.0 < need_size=30

Replied in #197. I agree they are the same issue and I still highly suspect this is a race condition. Will follow up there.