MISA
MISA copied to clipboard
refactor generator code
need to generalize code generation logic for different direction, precision, arch
-
global load/store:
- [ ] support different precision, fp32/fp16(short)/ubyte
- [ ] support 2d/3d load, and have exec mask from different dimension
- [ ] support
global_load/buffer_loadand accumulate through sgpr/vgpr
-
share memory load/store:
- [ ] support 1d/2d load/store from different precision
- [ ] support k pack
-
coalescing store:
- [ ] support multiple groups to do coalescing store
- [ ] support fp16/int8 final store out pack operation
- [ ] support some case not need LDS shuffle
- [ ] vector write out support
-
mfma main loop:
- [ ] different repeat/step
- [ ] support need inst-schedule or no need inst-schedule
- [ ] support k pack suitable from instruction requirement and precision
- [ ] support share load multiple k_pack at once, then do mfma multiple times
- [ ] pass through LDS
-
fma main loop
-
thread mapping