iree icon indicating copy to clipboard operation
iree copied to clipboard

[CPU][ArmSME] Update tiling to use all SME accumulators

Open MacDue opened this issue 1 year ago • 2 comments

Previously, we only tiled for a single SME accumulator. This patch updates the lowering_config to make use of all SME accumulators.

This is done by increasing the tile size to [8]x[8] for f32 and to [4]x[8] for f64. This lowers to four [4]x[4] 32-bit accumulators and eight [2]x[2] 64-bit accumulators respectively.

These tile sizes need some additional vector legalization passes, which have now been added to the ArmSME pipeline.

MacDue avatar Feb 13 '24 17:02 MacDue

cc @c-rhodes

MacDue avatar Feb 13 '24 17:02 MacDue

Note: This patch now needs a fix upstream due to #16350 (we need to legalize arith.constants), should be a simple fix but I'll have to wait until the next LLVM integration.

Edit: There a few more issues to look into :pensive:

MacDue avatar Feb 13 '24 19:02 MacDue

@hanhanW, @MaheshRavishankar if this looks okay, could we land this today? :pray:

P.s. I don't have write access.

MacDue avatar May 17 '24 12:05 MacDue