Carsten Bauer

Results 99 comments of Carsten Bauer

Thanks for the hints. Perhaps a better formulation of the benchmark challenge 😄 : https://discourse.julialang.org/t/performance-optimisation-julia-vs-c/59689

Internals documentation seems like a good place. 👍

@JackDevine Totally fine with it! Great visualizations!

The initial naive attempt didn't work. For the following test kernel, ```julia function kernel_wmma_tf32_lowlevel(a_dev, b_dev, c_dev, d_dev) a_frag = WMMA.llvm_wmma_load_a_col_m16n16k8_global_stride_tf32(pointer(a_dev), 16) b_frag = WMMA.llvm_wmma_load_b_col_m16n16k8_global_stride_tf32(pointer(b_dev), 8) c_frag = WMMA.llvm_wmma_load_c_col_m16n16k8_global_stride_f32(pointer(c_dev), 16) d_frag...

Thanks for your comments. I fixed (1): ```julia function kernel_wmma_tf32_lowlevel(a_dev, b_dev, c_dev, d_dev) a_frag = WMMA.llvm_wmma_load_a_col_m16n16k8_global_stride_tf32(pointer(a_dev), 16) b_frag = WMMA.llvm_wmma_load_b_col_m16n16k8_global_stride_tf32(pointer(b_dev), 8) c_frag = WMMA.llvm_wmma_load_c_col_m16n16k8_global_stride_f32(pointer(c_dev), 16) d_frag = WMMA.llvm_wmma_mma_col_col_m16n16k8_f32_f32(a_frag, b_frag, c_frag)...

I should probably say that I've used Julia 1.7 above. Trying Julia 1.8-beta1, which uses LLVM 13 (instead of 12) the loads seem to pass and I get ```julia julia>...

Good news: with the current state (and Julia >= 1.8) the errors from above are gone with this corrected example: ```julia function kernel_wmma_tf32_lowlevel(a_dev, b_dev, c_dev, d_dev) a_frag = WMMA.llvm_wmma_load_a_col_m16n16k8_global_stride_tf32(pointer(a_dev), 16)...

Note that I see a similar (the same?) segfault with f64 wmma: https://github.com/JuliaGPU/CUDA.jl/pull/1426#issuecomment-1057864919

> Try using an assertions build. That should become easier as soon as I tag the next version of LLVM.jl (today, normally). When I use an assertions build of julia...

Alright, I `] up`ed and then the assertion build didn't error. I get ```julia julia> call_kernel() julia: /workspace/srcdir/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:1105: llvm::SDValue::SDValue(llvm::SDNode*, unsigned int): Assertion `(!Node | | !ResNo || ResNo < Node->getNumValues())...