Jez Ng

Results 12 issues of Jez Ng

The autotuner is currently tied closely to the CUDA backend: it takes a bunch of CUDA-specific parameters and then passes them to `do_bench` or `do_bench_cudagraph`, which both call into many...

This makes the autotuner device-agnostic. Instead of having to know about the existence of e.g. do_bench_cudagraph, it can let the callers decide which backend-specific benchmarking function to use. See discussion...