[QST][CuTeDSL] How to warp-reduce `half2/bfloat162`
What is your question?
CuTeDSL lacks of half2/bfloat162 types. It is hard to describe a warp reduce that take half2/bfloat162 as input type.
In CUDA C++, I can use
for (...) {
auto max = hmax2(max, __shfl_down(val, ..., ...))
}
to do two reduce in the same __shfl_down instruction.
In CuTeDSL, can I do the same thing? Or if I do something like
for (...) {
auto max_1 = hmax(max_1, __shfl_down(val1, ..., ...))
auto max_2 = hmax(max_2, __shfl_down(val2, ..., ...))
}
, will jit compiler or PTX as fuse the two __shfl_down/hmax into one instruction for me?
What is your question?
CuTeDSL lacks of
half2/bfloat162types. It is hard to describe a warp reduce that takehalf2/bfloat162as input type.In CUDA C++, I can use
for (...) { auto max = hmax2(max, __shfl_down(val, ..., ...)) }to do two reduce in the same __shfl_down instruction.
In
CuTeDSL, can I do the same thing? Or if I do something likefor (...) { auto max_1 = hmax(max_1, __shfl_down(val1, ..., ...)) auto max_2 = hmax(max_2, __shfl_down(val2, ..., ...)) }, will jit compiler or PTX as fuse the two __shfl_down/hmax into one instruction for me?
Thanks for reporting. I think there might be two bugs
res = max(res, __shfl_down(val, ..., ...)should work for vectorized data- vectorized operation should be handled by compiler to generate
hmax2instruction
@brandon-yujie-sun
I do not think this is a bug.
I am confused about what the best practice is when we want the compiler to generate a vectorised type or operation?
Option 1: Maybe CuTeDSL can export vectorised types?
Option 2: Perhaps I can merge two bfloat16 values into a uint32 value and manually write PTX instructions for this value. and hope ptxas will reuse register for me.
Option 3: Maybe CuTeDSL can ensure the vectorised operation is generated by adding some Python constraints?
Option 1: Maybe CuTeDSL can export vectorised types?
TensorSSA is vectorized type. For this case, using TensorSSA should just work. But we currently don't handle this correctly. res = max(res, __shfl_down(val, ..., ...)
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.