Yiakwy

Results 41 comments of Yiakwy

One thing I want to say is, multi-device support is more like backend specific: different backend has its own best strategy to do the partition. You can add some attributes...

> > Hi @poyenc, thanks for the reminder. Do you mean it is technically impossible to make it work for navi or it is not on the official roadmap yet?...

@Hap-Zhang It might relate to latest added chunk pre-fill feature. Please use "--enforce-eager" mode, vLLM graph compiling is broken. With this on, you should expect 4600 toks/s in H20 single...

> I am facing the same issue, and it did not resolve after adding the --enforce_eager to the command. My flash-attn version is 2.4.2 @umechand-amd Vllm updates very quickly, I...

> The requirements.txt is used by cget(which is what rbuild uses) so users can install migraphx with its 3rd-party dependencies automatically. Thank you for instant response @pfultz2. So could we...

We want to see more improvement on compiler since this is the major gap between vLLM and TRT-LLM (with meylin compiler) support. B.t.w, what's your opinion with SGLang (they extensively...

@BBuf Great job! I can help you to compile the codes in AMD chip facilitate merging of your algorithm.

@miladm paged attention kernel will be eliminated by flash attention both in prefill stage and decoding stage soon. In that case, memory block management will returned back to memory manager....

> "To meet the XLA’s static shape requirement, we will bucketize the possible input shapes. ...to reduce the number of compiled graphs" It is not XLA requirement, it is hardware...

> Here is the WIP PR for the PagedAttention kernel on Pallas + TorchXLA: [pytorch/xla#6912](https://github.com/pytorch/xla/pull/6912). We expect it to land pretty soon. > > cc @wonjoolee95 Do you have any...