Haicheng Wu comments

Results 323 comments of


                                            Haicheng Wu

[QST] 2 GEMM fused result error

It is doable, just we haven't done it yet. Do you want to add a mxn matrix or a per channel bias vector?

[BUG] Fused GEMM example gives wrong result with some shapes

@jwang323

[BUG] Fused GEMM example gives wrong result with some shapes

Now, we assume the 2nd gemm problem size k is multiple of the threadblock tile size k. We can fix it pretty quickly. Before that, you can first use the...

[QST] How many threads and blocks does cutlass use? (When C is tall in official post)

The code you posted belongs to cutlass 0.1. The current cutlass looks very different. Here is how the top level looks like if you use tensor cores: https://github.com/NVIDIA/cutlass/blob/master/examples/14_ampere_tf32_tensorop_gemm/ampere_tf32_tensorop_gemm.cu#L212-L226 thread number...

[QST] How many threads and blocks does cutlass use? (When C is tall in official post)

Different problem size needs different tile sizes. You can use cutlass profiler to find it. Here is the doc: https://github.com/NVIDIA/cutlass/blob/master/media/docs/profiler.md You can use `cmake .. -DCUTLASS_NVCC_ARCHS="75" -DCUTLASS_LIBRARY_KERNELS=sgemm` to only generate...

[FEA] LinearCombinationSilu epilogue

`LinearCombinationRelu` has default value for `beta`. `LinearCombinationSilu` does not. I can add one very quickly. To work around it, you can change this line (https://github.com/NVIDIA/cutlass/blob/master/examples/17_fprop_per_channel_bias/fprop_per_channel_bias.cu#L196) to `{alpha, ElementComputeEpilogue(0)}`