cutlass [QST] How to efficiently implement 16bit convolution

[QST] How to efficiently implement 16bit convolution

Open nolyn opened this issue 2 years ago • 1 comments

I want to implement a convolution of two 16bit integer inputs. I think this can be split up into 4 convolutions of 8bit inputs, along with some bit shifts / scaling and addition in the end. I'm wondering how to do this most efficiently. In particular I see two options:

Keeping output type as int32 to be large enough for the 16 bit result, I could do all 4 convolutions sequentially and reuse the same output, i.e. for the latter three I use alpha=1, beta=1 for the Epilogue (though I would like to avoid the unnecessary multiplication by 1, maybe I can write a custom epiloge for it, might have to do so anyway for the scaling of the output to the correct range)
Use 4 distinct convolutions with their own output tensors, and only once all are done sum them all up

Which one could I expect to be faster? Intuitively I would say the second. But maybe not due to data synchronization. Will all 4 convolution be launched in parallel? or is there cudaDeviceSynchronize implicitly done somewhere in this, meaning next one has to wait for the former. For the first approach, maybe it can utilize local memory better? Also I just read there is now Back-to-back GEMM/CONV, maybe I should try to use it to implement this.

Thanks in advance!

May 31 '22 07:05 nolyn

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Jul 11 '22 12:07 github-actions[bot]

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Oct 09 '22 12:10 github-actions[bot]

Closing due to inactivity. Please reopen if needed.

Apr 27 '23 14:04 mnicely

cutlass cutlass copied to clipboard

[QST] How to efficiently implement 16bit convolution

cutlass
cutlass copied to clipboard