onnxruntime
onnxruntime copied to clipboard
Update custom Triton kernel documentation and examples
Description
Updates and improves on the existing documentation for using custom Triton kernels as operators in ONNX Runtime. Also has a small fix in the Python script for compiling Triton kernels, and allows compiling kernels outside of the docker build
step.
Motivation and Context
I wanted to write my own ORT operators using Triton, but the only examples in the codebase are for adding a kernel to an existing TunableOp
, with code spread over a large number of files.
This is my best attempt at making a more minimal example, with documentation of each step needed to do this independently.
Note
Calling the operator defined here currently triggers a CUDA error:
2024-05-31 15:58:30.844121139 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=998ab211f19f ; file=/code/onnxruntime/core/providers/cuda/gpu_data_transfer.cc ; line=73 ; expr=cudaMemcpyAsync(dst_data, src_data, bytes, cudaMemcpyDeviceToHost, static_cast<cudaStream_t>(stream.GetHandle()));
2024-05-31 15:58:30.844186034 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=998ab211f19f ; file=/code/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=446 ; expr=cudaStreamSynchronize(static_cast<cudaStream_t>(stream_));
I'm opening this as a draft PR until I can fix this – I'm also hoping I can get help fixing this, as I assume it's just a small mistake with how I'm passing in the CUDA stream to the kernel. If this isn't welcome, I can close this PR until I get it fixed.