Josh Fromm
Josh Fromm
I'd like to write something like this ``` CONV_LANG = """ def convolution(float(N,C,H,W) I, float(M,C,KH,KW) W1) -> (Xout) {{ # binarize weight tensor bin_W(m, c, kh, kw) = W1(m, c,...
For reference, here is the equivalent (fully tested and functional) Halide pipeline. ``` Var x("x"), y("y"), c("c"), k("k"); Func clamped; clamped(x, y, c) = BoundaryConditions::constant_exterior(input, 0)(x, y, c); Func binclamped;...
Looks great! It seems like a very clean way to approach the problem.
@benjaminfspector Thanks for the excellent thoughts and tips. At least in my case I am primarily interested in H100 with all the related features. For what its worth, we've found...
I think this issue should be resolved in #2919. The quantization kernel in triton was writing output using the same strides as the input but returning a contiguous tensor. This...