Eugene Zhulenev
Eugene Zhulenev
@timshen91 As you added the `conv_autotuning.proto`, I have a question, how do you feel about logging MatmulProto to it :) Specifically with the concrete addresses for inputs. (UPD: ok, given...
In XLA inside loops (and in general inside control flow) we keep buffer offsets on device, this for example allows to put two gemms writing at different offsets calculated at...
Well… it doesn’t that’s why I’m looking at adding cutlass :) it does it for non-gemm computations by compiling kernels, but for cuBLAS for example we are forced to materialize...
I'm also considering keeping it in XLA as template specialization as this is a little bit too xla specific (especially int32_t offsets, in general int64 makes more sense, but harder...
I implemented this inside XLA with template specializations here: https://github.com/openxla/xla/pull/7916, so I don't need it in CUTLASS right now, but in general I think it would be very useful if...
This is a general feature that we'd need for inputs and outputs (epilogues), we know the "base" address at run time when we prepare TMA descriptors (when they are initialized...
My end goal is to be able to compile host side and deice side code for this H100 GEMM: ``` /////////////////////////////////////////////////////////////////////////////////////////////////// #include "cutlass/cutlass.h" #include "cutlass/library/library.h" #include "cutlass/library/manifest.h" #include "library_internal.h" #include...
Clang: `clang version 18.0.0git (https://github.com/llvm/llvm-project.git a855b2c894444419c3689aff6fd0381fdeb02491)` CUDA: 12.2 CUTLASS: top of main branch
Do you have any benchmarks for this change? I recall I was running synthetic benchmarks and graph update is so cheap that I decided it's not worth bothering, certainly not...
Sounds good! If it's indeed expensive let's find a solultion. FYI I remembered that I have microbenchmarks here: https://github.com/openxla/xla/blob/main/xla/stream_executor/gpu/gpu_command_buffer_test.cc#L1274-L1356 - you can add something more representative there, although it's part...