HongyuChen comments

Results 12 comments of


                                            HongyuChen

How to make the performence breakdown like the picture Fig3?

Thank you for your reply. By "single-stream computational pipeline" do you mean that the time spent loading model weights from the HBM to the Cache will be counted in the...

[Doc] Update Ray Data distributed offline inference example

If my understanding is correct, is there data parallelism between nodes and tensor parallelism inside nodes in the documentation `offline_inference_distributed.py`?

sgmv_cutlass calculate wrong output

> Hi lequn, I think I found the bug of cutlass_shrink. > > Please first see [cutlass example 24 group gemm](https://github.com/NVIDIA/cutlass/blob/a75b4ac483166189a45290783cb0a18af5ff0ea5/examples/24_gemm_grouped/gemm_grouped.cu#L1529). The second parameter for `LinearCombination ` should `128 /...

Can flashinfer's CutlassSegmentGEMMSM90Run function be used for LoRA computing on H20?

@yzh119 Thanks for the reply, bro. I tried a smaller tile size as you suggested, and the performance did get an improvement (around 20%). But this still doesn't perform well...

[Bug]: Capture CudaGraph with LoRA

I defined a model myself and called bgmv in it to do some LoRA calculations, so indices=-1 resulted in cuda error. > I don't think LoRA should be captured in...

[Bug]: Capture CudaGraph with LoRA

Yes I don't want lora to be captured indeed. I think my error was caused by my missuse of bgmv kernel.

[Bug]: Capture CudaGraph with LoRA

I'm currently using version 0.7.2 I think I'll try cudagraph for lora of version 0.8 in the future Also I'd like to ask a question, enabling cudagraph for lora doesn't...

[Bug]: Capture CudaGraph with LoRA

> The 0.7.2 version should still be the V0 version of LoRA. For V0, vllm only captures cudagraph during the decode stage, and lora supports cudagraph, which you can confirm...

[BUG] Computing result error in cutlass gemm with specfied shape

UPDATE: the running result: will not report error (computation is finished), but `cutlass::reference::host::TensorEquals` failed

[BUG] Computing result error in cutlass gemm with specfied shape

> UPDATE: the running result: will not report error (computation is finished), but `cutlass::reference::host::TensorEquals` failed Maybe such an error is an accuracy issue?