Haicheng Wu

Results 323 comments of Haicheng Wu

if your input is fp32, you need to design the kernel from the scratch. The underlying kernel in this example does not support fp32.

could you plz share your cmake command? also it is known that we have issues on windows. have you tried linux? cc += @lsyyy666

what do you mean list of tensor? In this example, you just need to pass the pointers to the beginning of the tensors to the kernel. These tensors do not...

sm120 does not support `tcgen05` ptx.

@ANIKET-SHIVAM , @IonThruster @depaulmillz could you please take a look?

we can re-enable them if clang are happy now. PR please?

Yes, you are right. Actually, we heavily rely on inlining. Without it, we also have undefined behavior here and there. Not directly related, to build with nvcc with -G, we...

`cutlass::half_t` is the fp16 data type implementation in cutlass. It is defined in https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/half.h#L167