cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

[QST]question about cutlass epilogue customization

Open zwshan opened this issue 1 year ago • 16 comments

What is your question? May I ask if the epilogue of Cutlass supports customization? I hope to achieve the functionality of performing bias addition after the matmul operation in Cutlass. Additionally, I would like to apply different activation functions to different regions (for example, sigmoid and tanh). Is it possible to implement this? If so, could you please teach me how to do it?

zwshan avatar Jan 02 '24 14:01 zwshan

Yes, you can do any elementwise operation you want. Are you using cutlass 2.x or 3.x? Which architecture?

hwu36 avatar Jan 05 '24 16:01 hwu36

Yes, you can do any elementwise operation you want. Are you using cutlass 2.x or 3.x? Which architecture?

I am using 3.x and A100 sm80, could you help me?

zwshan avatar Jan 08 '24 08:01 zwshan

Yes, you can do any elementwise operation you want. Are you using cutlass 2.x or 3.x? Which architecture?

I would like to apply different activation functions to different regions in one A*B result (for example, sigmoid and tanh)

zwshan avatar Jan 08 '24 08:01 zwshan

8701704719135_ pic Can cutlass do this? @hwu36

zwshan avatar Jan 08 '24 13:01 zwshan

@thakkarV to comment on how to do it with 3.x on A100.

hwu36 avatar Jan 08 '24 14:01 hwu36

cc @apuaaChen for thoughts on how to do this with CUTLASS 3.x SM80 EVT (likely would need some added ops)

jackkosaian avatar Jan 08 '24 15:01 jackkosaian

this should be similar to epilogue scatter fusion since it needs to compute row number, too.

hwu36 avatar Jan 08 '24 15:01 hwu36

this is pretty easy to do in the CUTLASS 3 epilogues, but not something you can do OOTB so you will have to make a minor modification for a custom epi. One additional branch to dispatch to the activation function depending on the coordinate of the output tensor. You have access to the coordinate tensors already for the purpose of predicated stores to gmem, so just use those. https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/epilogue/collective/sm70_epilogue_vectorized.hpp#L230

thakkarV avatar Jan 08 '24 16:01 thakkarV

btw, I do not want to discourage you from using the 3.x API on Ampere, its totally kosher, we just recommend 2.x API for best performing Ampere kernels since they are well tuned over the years.

thakkarV avatar Jan 08 '24 16:01 thakkarV

the alternative solution is to just launch two different kernels on two separate streams, which will likely give you equivalent or perhaps even better perf depending on the problem shapes and if this boundary of activation functor is within or across output tiles.

thakkarV avatar Jan 08 '24 16:01 thakkarV

thank you everyone!you help me a lot. I will try it morning.

zwshan avatar Jan 08 '24 16:01 zwshan

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Feb 07 '24 17:02 github-actions[bot]

@zwshan have you resolved your issue?

mnicely avatar Feb 22 '24 15:02 mnicely

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Mar 23 '24 16:03 github-actions[bot]

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] avatar Jun 21 '24 17:06 github-actions[bot]

Hi! It can be supported by adding a new node. In order to get the row number, you can following the examples here https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/epilogue/threadblock/fusion/visitor_store.hpp#L749-L752 The coord at line 749 is the row number, column number, batch number.

apuaaChen avatar Jun 21 '24 19:06 apuaaChen