Nicolas Macchioni

Results 6 issues of Nicolas Macchioni

Differential Revision: D57371634 we can save a significant amount of benchmarking time in max-autotune-gemm mode if we group the benchmarking of Triton templates and backout early of templates that don't...

fb-exported
module: inductor
ciflow/inductor

add an option to switch triton hash key to a more verbose output that can help with performance debugging; the hash key now includes Triton template configs like BLOCK_M, BLOCK_N,...

module: inductor
ciflow/inductor

I'm currently working on reducing Inductor's compile time overhead in max-autotune-gemm mode. As part of this effort, I profiled some individual matmul autotunings and noticed that `do_bench` was particularly expensive....

Fixes #ISSUE_NUMBER cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

module: inductor
ciflow/inductor

Copy/pasted the estimation loop from `do_bench` into `do_bench_cudagraph` in favor of the original create graph -> measure replay methodology. Creating a graph is expensive (~300ms on A100 for me), even...

Summary: `should_pad_common` and `should_pad_bench` logic were semi-intertwined which can make working with the padding logic difficult previously there was no clear delineation as to what logic belonged in which of...

fb-exported
ciflow/trunk
topic: not user facing
module: inductor
ciflow/inductor
meta-exported