triton
triton copied to clipboard
Getting started with Triton for custom hardware backend
I am looking for some pointers to get started with leveraging Triton to generate kernels for a custom hardware backend.
I see there have been efforts made that support lowering PyTorch to Triton Kernels for a CPU backend but I was hoping if someone could share some tips on the moving parts involved in doing the same for a custom hardware.
Our compiler stack is able to lower a Pytorch graph to our hardware ISA but right now we are building some custom kernels by hand which we want to use Triton for.