Hongtao Yu comments

Results 53 comments of


                                            Hongtao Yu

[Torch] Add more mm kernel choices

Thanks for working on this. Can I ask where do you get the new configs and how they move the perf numbers? BTW, there are already-tuned numbers in the Triton...

[Torch] Add more mm kernel choices

> > Thanks for working on this. > > Can I ask where do you get the new configs and how they move the perf numbers? > > BTW, there...

[Torch] Add more mm kernel choices

> Less than double what ? can we get a perf run of this before landing. The mm template is only used in a few OSS models, so I'd expect...

Generalize loop pipelining

I'm sending out this diff to get early feedbacks. Regarding the performance testing, I'm still looking for memory-bound kernels with heavy computations. Please share if you have such kernels. The...

Generalize loop pipelining

> if loop_annotation || (matmul_loop && global_num_stage > 1) Sounds good to check against the annotation. How do you think we should handle matmul loops with extra loads? E.g, one...

Generalize loop pipelining

> > A load of an indexing tensor which is in turn used to load the one dot operand is pipelined. > > Hey @htyu I am finishing up PR...

Generalize loop pipelining

> > I think we can support this feature, I was asking as the PR is out of date right now but it is fine to me if we want...

Generalize loop pipelining

I will do a rebasing. The test case (in test/TritonGPU/loop-pipeline.mlir) has been updated with that loop annotation. Let me know if that looks good. Thanks.

Generalize loop pipelining

Rebasing done.

Generalize loop pipelining

> One final comment is that maybe we want to lift the logic out of `MatmulLoopPipeline` later since it's not "matmul" anymore? It's a good point. Or maybe we could...