cutlass
cutlass copied to clipboard
[QST] thread num assert in sm70_epilogue_vectorized
Hi, @thakkarV https://github.com/NVIDIA/cutlass/blob/47a3ebbea9860e14c095b52c4e6e2db33340f572/include/cutlass/epilogue/collective/sm70_epilogue_vectorized.hpp#L237
Strangely, it requires TiledCopyS2R's threads equal to the MMA AtomC's threads. I think here we describe how each thread does LDS and therefore it should be:
CUTE_STATIC_ASSERT(typename TiledCopyS2R::TiledNumThr{} == thr_size(typename TiledMma{}));
cc @hwu36 @ccecka Could you please help to check this issue?
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.