Haicheng Wu

Results 323 comments of Haicheng Wu

You can set the channel number as 1 just like what you did, but it is not the most efficient implementation. We don't have an efficient depthwise conv in our...

Some one implemented it based on cutlass. Check the one forked most in https://github.com/NVIDIA/cutlass/network/members . They made it.

depthwise conv is supported in 2.10. we will keep improving it.

Thanks, I will take a look and run the tests. If any change is needed, I will do it myself and push to your branch.

This is a useful feature for T4 or maybe small ampere cards. I will work on it hopefully this month.

> We do have [singlestage mma](https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/gemm/threadblock/mma_singlestage.h) pipeline for GEMMs. Do we have some use cases for T4 where single stage wins over 2-staged pipeline? Yes, most kernels picked by cublas...

Hi @leiwen83 , Sorry for the delay. I am working on this one now. Have you tested your code in any way?

What type of batch are you interested in? Data types, layouts, architectures, etc.

> normal NCHW or NHWC is preferred Gemm works on 2D data. Do you want row major or column major for each A, B, C in C = A x...