Hongtao Yu comments

Results 53 comments of


                                            Hongtao Yu

[backend]Ensure coalescing and saturation for every memop in a slice.

> > > @ThomasRaoux It looks like the test failure is due to an unhandled case in wgmma to LLVM lowering, where the accumulator was zero. The zero accumulator was...

[backend]Ensure coalescing and saturation for every memop in a slice.

Thanks a lot for measuring perf for this change. At this point it appears that the heuristics needs more tweaks. > So one example should be easy to reproduce, for...

[backend]Ensure coalescing and saturation for every memop in a slice.

> > What is the rationale behind this? Is that because less-than-4-byte load per thread would result in a waste of a 128-byte memory transaction per warp? > > yes...

[backend]Ensure coalescing and saturation for every memop in a slice.

> I scan quickly through the code, could you add a high level comment explaining the heuristic as I'm not sure I fully understand from the code. Thanks for taking...

[backend]Ensure coalescing and saturation for every memop in a slice.

Summary updated to reflect the new heuristic.

[backend]Ensure coalescing and saturation for every memop in a slice.

> @htyu @ThomasRaoux is the optimization still on the menu? I am just learning how to auto coalesce global access to SMEM (to make sure data load store continuously). >...

[BACKEND] Consider mask alignment in memory coalescing

> This tanks some of our kernel's perf somehow :(. Did you run it on pytorch benchmarks? Is there no regressions on the workloads you have? If not I'll have...

[BACKEND] Consider mask alignment in memory coalescing

> > > This tanks some of our kernel's perf somehow :(. Did you run it on pytorch benchmarks? Is there no regressions on the workloads you have? If not...

[BACKEND] Consider mask alignment in memory coalescing

@ThomasRaoux is there a way to share your benchmarks regressed by this patch for me to investigate? We see this could be a general fix and we would like to...

[BACKEND] Consider mask alignment in memory coalescing

> I do think it is good patch in general but it would be great if you could fix the regression in torch benchmarks, then I can check if it...