Yiakwy comments

Results 41 comments of


                                            Yiakwy

Add multi-device support to ONNX

One thing I want to say is, multi-device support is more like backend specific: different backend has its own best strategy to do the partition. You can add some attributes...

Support AMD ROCm on FlashAttention 2

> > Hi @poyenc, thanks for the reminder. Do you mean it is technically impossible to make it work for navi or it is not on the official roadmap yet?...

Nvidia-H20 with nvcr.io/nvidia/pytorch:23.12-py3，CUBLAS Error！

@Hap-Zhang It might relate to latest added chunk pre-fill feature. Please use "--enforce-eager" mode, vLLM graph compiling is broken. With this on, you should expect 4600 toks/s in H20 single...

Nvidia-H20 with nvcr.io/nvidia/pytorch:23.12-py3，CUBLAS Error！

> I am facing the same issue, and it did not resolve after adding the --enforce_eager to the command. My flash-attn version is 2.4.2 @umechand-amd Vllm updates very quickly, I...

[impr]: RENAME requirements.txt to rbuild.json

> The requirements.txt is used by cget(which is what rbuild uses) so users can install migraphx with its 3rd-party dependencies automatically. Thank you for instant response @pfultz2. So could we...

[Roadmap] vLLM Roadmap Q3 2024

We want to see more improvement on compiler since this is the major gap between vLLM and TRT-LLM (with meylin compiler) support. B.t.w, what's your opinion with SGLang (they extensively...

use warp shuffle style reduce

@BBuf Great job! I can help you to compile the codes in AMD chip facilitate merging of your algorithm.

[RFC] Initial Support for Cloud TPUs

@miladm paged attention kernel will be eliminated by flash attention both in prefill stage and decoding stage soon. In that case, memory block management will returned back to memory manager....

[RFC] Initial Support for Cloud TPUs

> "To meet the XLA’s static shape requirement, we will bucketize the possible input shapes. ...to reduce the number of compiled graphs" It is not XLA requirement, it is hardware...

[RFC] Initial Support for Cloud TPUs

> Here is the WIP PR for the PagedAttention kernel on Pallas + TorchXLA: [pytorch/xla#6912](https://github.com/pytorch/xla/pull/6912). We expect it to land pretty soon. > > cc @wonjoolee95 Do you have any...