Yifan Qiao comments

Results 20 comments of


                                            Yifan Qiao

Can we run dorylus without ec2man?

Hi, Thank you for trying Dorylus. Here are my answers to your questions: > Is it possible to run without using `ec2man`? Yes, you can. `ec2man` is only a helper...

What specific kind of Lambda trigger did you use as its event and why C++?

Hi Lorisyy, Thank you for your interest in Dorylus. I am glad to answer your questions. > How to trigger a Lambda? You are right that AWS Lambda is an...

Question about the loading of Friendster

Hi Tracy, Thank you for your interest in Dorylus. Yes, you are right that partitioning large graphs can take a huge amount of memory. I didn't have concrete memory consumption...

Error on gpt-oss with vLLM

Hi @jhsmith409, thanks for the issue and sorry it took a while to investigate. Regarding your issue, I think FlashAttention3 is a requirement from vLLM side so perhaps you will...

Dynamic Memory Management incurs error when running 4 instances on 2 * A100 GPUs

Thanks for the issue. Sorry for the delayed response. Let me start from the second issue. This is indeed a TP communication socket issue and actually I just fixed it...

Hybrid models with more than one KV cache type are not supported yet

Hi @alecngo, Thanks for the issue! This is a known issue #202 and we are still working on it. For now you may have to use models with only full...

Hybrid models with more than one KV cache type are not supported yet

Hi @alecngo, I have been investigating this for a while and found a temporary solution for now: please add `--disable-hybrid-kv-cache-manager` when launching the vllm server. Let me know if it...

ValueError: Cannot get 31 free blocks from the pool

Thanks for the details and the log! I agree this is related to #191. I think this is likely due to some race conditions when the LLM engine A checks...

KVCached support for gpt-oss-20b attention type not supported in SGLang

Hi @deepak-vij, after some investigation, we now plan to support vLLM first by adding `--disable-hybrid-kv-cache-manager` when launching the vllm server. Feel free to try it out and support sglang if...

Two Qwen3-32B-FP8 instances on an H20 96G GPU using vLLM fails to process requests

It looks like the first engine has somehow taken most of GPU memory (95% KV cache usage). In this case the second engine cannot allocate KV caches and hence cannot...