Yifan Qiao

Results 20 comments of Yifan Qiao

Hi, Thank you for trying Dorylus. Here are my answers to your questions: > Is it possible to run without using `ec2man`? Yes, you can. `ec2man` is only a helper...

Hi Lorisyy, Thank you for your interest in Dorylus. I am glad to answer your questions. > How to trigger a Lambda? You are right that AWS Lambda is an...

Hi Tracy, Thank you for your interest in Dorylus. Yes, you are right that partitioning large graphs can take a huge amount of memory. I didn't have concrete memory consumption...

Hi @jhsmith409, thanks for the issue and sorry it took a while to investigate. Regarding your issue, I think FlashAttention3 is a requirement from vLLM side so perhaps you will...

Thanks for the issue. Sorry for the delayed response. Let me start from the second issue. This is indeed a TP communication socket issue and actually I just fixed it...

Hi @alecngo, Thanks for the issue! This is a known issue #202 and we are still working on it. For now you may have to use models with only full...

Hi @alecngo, I have been investigating this for a while and found a temporary solution for now: please add `--disable-hybrid-kv-cache-manager` when launching the vllm server. Let me know if it...

Thanks for the details and the log! I agree this is related to #191. I think this is likely due to some race conditions when the LLM engine A checks...

Hi @deepak-vij, after some investigation, we now plan to support vLLM first by adding `--disable-hybrid-kv-cache-manager` when launching the vllm server. Feel free to try it out and support sglang if...

It looks like the first engine has somehow taken most of GPU memory (95% KV cache usage). In this case the second engine cannot allocate KV caches and hence cannot...