onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

Update custom Triton kernel documentation and examples

Open Numeri opened this issue 8 months ago • 8 comments

Description

Updates and improves on the existing documentation for using custom Triton kernels as operators in ONNX Runtime. Also has a small fix in the Python script for compiling Triton kernels, and allows compiling kernels outside of the docker build step.

Motivation and Context

I wanted to write my own ORT operators using Triton, but the only examples in the codebase are for adding a kernel to an existing TunableOp, with code spread over a large number of files.

This is my best attempt at making a more minimal example, with documentation of each step needed to do this independently.

Note

Calling the operator defined here currently triggers a CUDA error:

2024-05-31 15:58:30.844121139 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=998ab211f19f ; file=/code/onnxruntime/core/providers/cuda/gpu_data_transfer.cc ; line=73 ; expr=cudaMemcpyAsync(dst_data, src_data, bytes, cudaMemcpyDeviceToHost, static_cast<cudaStream_t>(stream.GetHandle())); 
2024-05-31 15:58:30.844186034 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=998ab211f19f ; file=/code/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=446 ; expr=cudaStreamSynchronize(static_cast<cudaStream_t>(stream_)); 

I'm opening this as a draft PR until I can fix this – I'm also hoping I can get help fixing this, as I assume it's just a small mistake with how I'm passing in the CUDA stream to the kernel. If this isn't welcome, I can close this PR until I get it fixed.

Numeri avatar May 31 '24 16:05 Numeri