composable_kernel
composable_kernel copied to clipboard
Add a scale op, related instances and examples
- Add element op
- Add instances
- Add example
- Add client example
I have good idea for this problem:
- Add struct SCALAR : public BaseTensorLayout
- Pass Tuple<SCALAR, SCALAR, SCALAR> as layouts for Ds
- Pass only one value with shape ={1}, strides={0} for each d0,d1,d2
- Add private scale0, scale1, scale2 members in ConvScale elementwise, add some device function SetScales(scale0, scale1, scale2) and run it in global function
- in kernel add some if constexpr(is_same_v<Ds_layout.AT(i), SCALAR>), in this case set scale0, scale1, scale2 members to values under d0, d1, d2 pointer and pass empty D tuple to gridwise gemm
Pros of this solution:
- Easier API for the user (he dont need to pass some weird strides)
- Less memory usage (only need to allocate sizeof(datatype) bytes for each d0, d1,d2
- Kernel dont need to load d0, d1, d2 for each C store
- It allows to use CScalarPerVector > 1