composable_kernel
composable_kernel copied to clipboard
[Ck tile] Use raw store to improve layernorm performance
Proposed changes
- Simpler kernel example for layernorm
- use store_tile_raw for Default2DEpilogueProblem to improve performance
Checklist
use following command to check performance make -j tile_layernorm2d_fwd && ./bin/tile_layernorm2d_fwd -m=128 -n=8192 -prec_i=bf16 -fadd=1