tvm [RELAX] Tuning capability for external cuBLAS codegen

[RELAX] Tuning capability for external cuBLAS codegen

Open apeskov opened this issue 1 year ago • 0 comments

This PR introduces API of kernel tuning for external runtimes like cuBLAS/cutlass.

It contains initial implementation of tuning algorithm for cuBLAS runtime. By default cublas uses heuristic based approach for kernel selection but in some cases it may be suboptimal, specially in case of kernels with dynamic shapes. Predefined collection of kernel descriptors (aka cublasLtMatmulAlgo_t) can allow to improve it.

Examples of usage:

mod = partition_for_cublas(mod)
db = TuneCodegenAlgo(mod, codegen_name="cublas")
mod = relax.transform.RunCodegen({"cublas": {"algo_db": db}})(mod)
ex = relax.build(mod, "cuda")

with open("algo_db.json", "r") as f:
    db = AlgoDatabase.from_json(f.read())

mod = partition_for_cublas(mod)
mod = relax.transform.RunCodegen({"cublas": {"algo_db": db}})(mod)
ex = relax.build(mod, "cuda")

Mar 21 '24 15:03 apeskov

tvm tvm copied to clipboard

[RELAX] Tuning capability for external cuBLAS codegen

tvm
tvm copied to clipboard