composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Add a scale op, related instances and examples

Open geyyer opened this issue 1 year ago • 1 comments

  • Add element op
  • Add instances
  • Add example
  • Add client example

geyyer avatar Apr 15 '24 19:04 geyyer

I have good idea for this problem:

  • Add struct SCALAR : public BaseTensorLayout
  • Pass Tuple<SCALAR, SCALAR, SCALAR> as layouts for Ds
  • Pass only one value with shape ={1}, strides={0} for each d0,d1,d2
  • ⁠Add private scale0, scale1, scale2 members in ConvScale elementwise, add some device function SetScales(scale0, scale1, scale2) and run it in global function
  • in kernel add some if constexpr(is_same_v<Ds_layout.AT(i), SCALAR>), in this case set scale0, scale1, scale2 members to values under d0, d1, d2 pointer and pass empty D tuple to gridwise gemm

Pros of this solution:

  • Easier API for the user (he dont need to pass some weird strides)
  • Less memory usage (only need to allocate sizeof(datatype) bytes for each d0, d1,d2
  • Kernel dont need to load d0, d1, d2 for each C store
  • It allows to use CScalarPerVector > 1

bartekxk avatar May 04 '24 09:05 bartekxk