iree icon indicating copy to clipboard operation
iree copied to clipboard

[GPU][DT] Tracking issue for data-tiled llama 3.1 405b

Open jtuyls opened this issue 3 months ago • 0 comments

To enable data-tiling on llama 3.1 405b we need a couple of new features/fixes so creating an issue to track the sub-tasks/progress and discuss performance numbers once we get there.

Some of the initial tasks:

  • [ ] The memory footprint of the data-tiled execution needs to be reduced so that the 405b model weights fit on a single GPU: https://github.com/iree-org/iree/issues/21659
  • [x] We need support for scaled matmul with encodings as the main matmuls will operate on the mxfp4 data type: https://github.com/iree-org/iree/issues/21923
  • [ ] We need to implement a ukernel with the mxfp4 data type and MFMAs: https://github.com/iree-org/iree/issues/21938
  • [x] Get Llama 405b MLIR without asm/wave mxfp4 kernels: https://github.com/iree-org/iree/issues/22002

jtuyls avatar Sep 11 '25 07:09 jtuyls