composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

[WIP] add more example for permute/scatter-gather/moe/tile-reduce/fa

Open carlushuang opened this issue 1 year ago • 0 comments

  • [x] add test topk
  • [x] add example topk-softmax
  • [x] add test tile_reduce
  • [x] add test scatter-gather
  • [x] add tensor transform support for scatter-gather
  • [x] modify buffer raw related tile api
  • [x] add async load (non-raw version) api
  • [x] add block_tile_reduce_xor_sync() api
  • [x] add BlockReduce2D operator for thread+warp reduce
  • [x] add example permute
  • [x] add example elementwise
  • [x] add upack-static-ford/unpack-sweep-tile-span
  • [x] add tile_window_linear to better control flag/voffset
  • [ ] add permute utility kernel for moe index
  • [ ] debug moe-ffn pipeline
  • [ ] add example moe
  • [ ] refine fa pipeline

carlushuang avatar Sep 16 '24 09:09 carlushuang