Haicheng Wu comments

Results 323 comments of


                                            Haicheng Wu

[FEA] LinearCombinationSilu epilogue

You can set the channel number as 1 just like what you did, but it is not the most efficient implementation. We don't have an efficient depthwise conv in our...

[FEA] LinearCombinationSilu epilogue

Some one implemented it based on cutlass. Check the one forked most in https://github.com/NVIDIA/cutlass/network/members . They made it.

[FEA] LinearCombinationSilu epilogue

depthwise conv is supported in 2.10. we will keep improving it.

add Conv singlestage

Thanks, I will take a look and run the tests. If any change is needed, I will do it myself and push to your branch.

add Conv singlestage

This is a useful feature for T4 or maybe small ampere cards. I will work on it hopefully this month.

> We do have [singlestage mma](https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/gemm/threadblock/mma_singlestage.h) pipeline for GEMMs. Do we have some use cases for T4 where single stage wins over 2-staged pipeline? Yes, most kernels picked by cublas...

Haicheng Wu

[FEA] LinearCombinationSilu epilogue

[FEA] LinearCombinationSilu epilogue

[FEA] LinearCombinationSilu epilogue

add Conv singlestage

add Conv singlestage

add Conv singlestage

add Conv singlestage

[BUG] batch GEMM execution via cutlass_profiler gives weird outputs

[BUG] batch GEMM execution via cutlass_profiler gives weird outputs

[BUG] batch GEMM execution via cutlass_profiler gives weird outputs