dace
dace copied to clipboard
Gemm FPGA
Describe the bug Some particular configurations of GEMM for FPGA currently stall for Xilinx.
For Intel, there is a small issue with the generated code.
To Reproduce Steps to reproduce the behavior, for Xilinx:
python3 gemm_systolic_vectorized.py 384 384 384 48 8 --tile-size-n=48 --tile-size-m=192
For Intel, any size will produce an erroneous generated code. The issue is in the transposition kernel that reads always from the same channel (0
) instead of using the loop iteration variable (k1
)
Additional context This is probably due to the first part (Read From A and transposition. Some of the maps seem to loop over K/mem_veclen, others use TN)