cutlass
cutlass copied to clipboard
[QST] Is it possible to detect output coordinates in elementwise epilogue ?
What is your question?
Hello! I want to implement elementwise epilogue, which depends on output matrix coordinates, i.e.
d_ij = F(alpha * sum_k(a_ik * b_kj) + c_ij, i, j)
It feels like providing own ActivationFunctor
into cutlass::epilogue::thread::LinearCombinationGeneric
are the simplest and intuitive way to implement own custom epilogue, but it looks like impossible to determine element coordinates inside ActivationFunctor::operator()
.
Am I right that there is no easy way of doing this?
Also I'm fine with any solution based on any cutlass major version (2/3).
Thanks!
For CUTLASS 3.x epilogues based on CuTe, its trivial to inject the coordinate from the collective epi into the thread functor.
We already create the coordinate tensor for the purposes of in-bounds check and predication here: https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/epilogue/collective/sm70_epilogue_vectorized.hpp#L231
You can just write a custom epi thread op and pass the C and D coordinates into it.
I.e. extend this line:
https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/epilogue/collective/sm70_epilogue_vectorized.hpp#L307
to include the coordinate tensor output as well (tDcDmn
tensor in the code)
Same applies to the direct store non-optimized epi
@hwu36 can help answer for 2.x API epilogues.
@thakkarV thank you, I will check your proposal and response later.
But I will be pleased to have answer for 2.x API too, so response from @hwu36 will not be superfluous.
I feel it needs some plumbing to get it work in cutlass 2.x. when the epilogue calls store
or load
, the coordinates are calculated inside the iterator: https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/epilogue/threadblock/predicated_tile_iterator.h#L410-L411
activation function is called however in the upper level: https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/epilogue/threadblock/epilogue.h#L374 which is right before calling the iterator to call the store. So, if you need to use cutlass 2.x, you need to use the iterator logic to compute row, col and then choose the activation func, and then calling store.
@ankutalev has your issues been resolved?
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
@mnicely , yes, sorry for late reply