Haicheng Wu

Results 323 comments of Haicheng Wu

Cutlass has simt conv kernel for sm75

@ccecka , @thakkarV , more work needed in this pr?

Found some compiler bugs when working on yours. Addressing these bugs now.

Sorry. This one is good. Will merge it around the time tagging 3.5.1 which is this weekend or next Monday. I thought you were pinging me for your mixed input...

the output is not deterministic if you use atomic with float point. it is hard to debug the numeric issues. atomic is usually faster though.

FYI, it is waste of time to hipify cutlass since cutlass uses lots of inline ptx. You could rewrite these inline ptx in cuda, but it is not cutlass anymore.

conv epilogue reuses gemm epilogue which assumes the output a row major dense matrix. the column number is `K` which is 1 in your case, and the row number is...

i would say that you would have to hack what ever that is not right. the code is like ```for (row) for (col) *memory_ptr(row, col) = ``` you essentially needs...