Pierre Chatelier

Results 73 comments of Pierre Chatelier

I have a new issue, but I don't know if it's related. Let me know if I have to open a new Issue. CUTLASS will unexpectedly hang after several calls....

OK, I got it. I have been tricked by using a stream. The hang is just delayed : it is just the call to `implicit_gemm_op.run(stream);` that takes a few minutes...

Ok. I have also been tricked by using a Debug build. For my 1111x1024 convolved by 21x1 : Release : 17ms Debug : 153166 ms (9000 times slower) Ouch.

> your C is only 1 which is bad for vectorized operation including mma. you can try to use fixed channel (https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/conv/convolution.h#L114). example is in https://github.com/NVIDIA/cutlass/blob/main/test/unit/conv/device/conv2d_fprop_fixed_channels_f16nhwc_f16nhwc_f16nhwc_tensor_op_f32_sm80.cu Does not seem to...

> what is after `cutlass::conv::kernel::DefaultConv2dFprop`? I really need floats, because accuracy tests on my datasets shew me that half/bf16/tf32 were not precise enough for my data ranges. So my `cutlass::conv::kernel::DefaultConv2dFprop`...

Sure, that's exactly what I tried, but then the compilation fails. I think it's related to the fact that I use float,float,float that currently implies `cutlass::arch::OpClassSimt` and not `cutlass::arch::OpClassTensorOp` like...

Nope. On one hand : - specialization works and compilation succeeds - speed is significantly improved but on the other hand, far more critical : - now numerical results are...

Interesting technique, but I think I am doomed. At last, CUTLASS does not seem to be the right tool for basic 2D convolution. I expected speedup from the promise of...

A new failed attempt to produce a correct result :-( What I did : - replace my previous Kernel with the 3xTF32 kernel example - in the Epilogue, use 1...

> you also need to change the input alignment to be 1 too You are right, and now it works. Time for a recap. The goal is a basic 2D...