llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

CUDA: add CONV_3D operator support

Open YaelGitAccount opened this issue 1 month ago • 3 comments

Summary

Adds CUDA support for GGML_OP_CONV_3D, enabling full 3D convolution on NVIDIA GPUs with correct multi-dimensional indexing.
The implementation matches the CPU semantics exactly, including fused channel dimensions and nb[] byte-stride layout.

Changes

  • Added conv3d.cu and conv3d.cuh with CUDA kernel and helpers
  • Added dispatch path in ggml-cuda.cu
  • Updated operator registration in ggml-cuda.cu
  • Updated docs/ops.md and docs/ops/CUDA.csv to include CONV_3D

Implementation

  • One CUDA thread per output element (batch × OC × OD × OH × OW)
  • Correct fused-dimension addressing:
    • Input: b * IC + ic
    • Kernel: oc * IC + ic
    • Output: b * OC + oc
  • Full nb[] stride-aware indexing matching CPU layout
  • Supports F32 input/output and F16/F32 kernel weights
  • Fully respects stride, padding, dilation, and 3D spatial dimensions
  • Follows existing CUDA backend structure and coding conventions

Testing

  • All CONV_3D backend tests pass for CUDA (F32/F16 kernels, all shapes)
  • Numerical parity with CPU across all tested configurations
  • No regressions in CUDA backend test suite
  • Full backend test suite passes (no global regressions)

Compatibility

  • CUDA backend only
  • CPU path unchanged
  • No external dependencies added
  • Preserves GGML tensor layout conventions

YaelGitAccount avatar Nov 14 '25 01:11 YaelGitAccount

This PR is ready for review. Tagging @CISC and @slaren— your feedback would be greatly appreciated whenever you have the chance. Thanks for your work on maintaining and improving the CUDA backend!

YaelGitAccount avatar Nov 14 '25 08:11 YaelGitAccount

Unfortunately, the reason no backends support CONV_3D is that ggml_conv_3d uses the IM2COL_3D op instead. This is an unused op.

CISC avatar Nov 14 '25 09:11 CISC

There also exists https://github.com/ggml-org/llama.cpp/pull/16948 . You can use the conv3d test program from that pr to compare the performance.

Green-Sky avatar Nov 14 '25 09:11 Green-Sky

Thanks for the review and the clarification!

If it makes sense for the project, I can follow up with a small PR that adds optional graph support for GGML_OP_CONV_3D. The idea would be:

• keep the existing IM2COL_3D + MUL_MAT lowering as the default path
• allow backends that explicitly report Conv3D support to receive a native Conv3D node
• measure the performance impact on CUDA (memory footprint, bandwidth, end-to-end time)

If the benchmarks show clear benefits for Conv3D on supported backends, then enabling the native path in more scenarios could be considered. Otherwise, the fallback path remains unchanged.

This keeps behavior stable while opening the door for backend-level optimizations, without committing the project to any change in graph lowering.

Let me know if this direction sounds reasonable — happy to iterate.

YaelGitAccount avatar Nov 16 '25 12:11 YaelGitAccount