composable_kernel
composable_kernel copied to clipboard
[WIP] add more example for permute/scatter-gather/moe/tile-reduce/fa
- [x] add test topk
- [x] add example topk-softmax
- [x] add test tile_reduce
- [x] add test scatter-gather
- [x] add tensor transform support for scatter-gather
- [x] modify buffer raw related tile api
- [x] add async load (non-raw version) api
- [x] add block_tile_reduce_xor_sync() api
- [x] add BlockReduce2D operator for thread+warp reduce
- [x] add example permute
- [x] add example elementwise
- [x] add upack-static-ford/unpack-sweep-tile-span
- [x] add tile_window_linear to better control flag/voffset
- [ ] add permute utility kernel for moe index
- [ ] debug moe-ffn pipeline
- [ ] add example moe
- [ ] refine fa pipeline