Interfaces compared with cutlass
Hi, will composable_kernel provide an interface similar to cutlass? Some projects based cutlass are difficult to hipify, such as fastertransformer, xformer etc.
Do you have any plans, such as a thin wrapper library?
Hi @hclearner , I prepared some example of creating layout using tensor descriptor and some draft of Layout wrapper: #1054 Can You look at this? Which version is more preferred?
Hi @bartekxk ,the two implementations are the same, the second version is more preferred for me. I think this is just the layout using tensor descriptor in ck, it is also very difficult for us to hipify the cutlass API to ck API in some projects such as xformer, fasttransformer, tiny-cuda-nn, deepspeed etc. Do you have any plans, like rocblas and cublas, rccl and nccl, or hipcub and cub. Thanks :)
Hi @hclearner , I prepared some example of creating layout using tensor descriptor and some draft of Layout wrapper: #1054 Can You look at this? Which version is more preferred?
@hclearner Yes, I prepared this example to make sure that such API: https://github.com/ROCmSoftwarePlatform/composable_kernel/pull/1054/files#diff-cd0bbbc81c77205e3a568fe45046b2e20f8cc7fe5a2d3c991a59add9f2c63d61 is needed. So I will move this from the examples to the exact library and develop it further starting from Layout and base support for other Structures. If You have some especially needed operations, let me know. Thanks
@bartekxk Cool!We're hipifying the library like xformer in AMD GPU. The operation is mem_efficient_attention. https://github.com/facebookresearch/xformers/tree/main/xformers/csrc/attention/cuda/fmha
Hi @hclearner , thanks. Maybe I'm wrong but I don't see CuTe usage in https://github.com/facebookresearch/xformers/tree/main/xformers/csrc/attention/cuda/fmha. Or do you need this just because it would be easier for You to use such a wrapper? I just want to make sure which part of the wrapper support I should focus on the most
@
Hi @hclearner , thanks. Maybe I'm wrong but I don't see CuTe usage in https://github.com/facebookresearch/xformers/tree/main/xformers/csrc/attention/cuda/fmha. Or do you need this just because it would be easier for You to use such a wrapper? I just want to make sure which part of the wrapper support I should focus on the most
@bartekxk Maybe my description is not clear. I think cute is a part of CUTLASS. The problem is that it's difficult and time-consuming to hipify from CUTLASS to CK like https://github.com/facebookresearch/xformers/blob/1254a167bacab5b373b9807070354097a65f3e96/xformers/csrc/attention/cuda/fmha/kernel_forward.h#L641, https://github.com/facebookresearch/xformers/blob/1254a167bacab5b373b9807070354097a65f3e96/xformers/csrc/attention/cuda/fmha/kernel_forward.h#L453. xformer has been hipified in ROCm with CK in https://github.com/ROCmSoftwarePlatform/xformers. But there are also many projects based on CUTLASS, it is also hard to hipify.
Hi @hclearner , after rapid development we prepared basic version of requested API. You can refer to:
I will be grateful for your feedback and will be happy to help if there are any problems or further requirements.
@hclearner Closing ticket as complete. Please re-open if you require further assistance with this ticket. Thanks!