tvm
tvm copied to clipboard
[RELAX] Tuning capability for external cuBLAS codegen
This PR introduces API of kernel tuning for external runtimes like cuBLAS/cutlass.
It contains initial implementation of tuning algorithm for cuBLAS runtime. By default cublas uses heuristic based approach for kernel selection but in some cases it may be suboptimal, specially in case of kernels with dynamic shapes. Predefined collection of kernel descriptors (aka cublasLtMatmulAlgo_t) can allow to improve it.
Examples of usage:
mod = partition_for_cublas(mod)
db = TuneCodegenAlgo(mod, codegen_name="cublas")
mod = relax.transform.RunCodegen({"cublas": {"algo_db": db}})(mod)
ex = relax.build(mod, "cuda")
with open("algo_db.json", "r") as f:
db = AlgoDatabase.from_json(f.read())
mod = partition_for_cublas(mod)
mod = relax.transform.RunCodegen({"cublas": {"algo_db": db}})(mod)
ex = relax.build(mod, "cuda")