AMDMIGraphX
AMDMIGraphX copied to clipboard
AMD's graph optimization engine.
First round in addressing #4334 ## Motivation rocMLIR has implemented split kv and GQA, which enables us to implement flash decoding. Now we need to add this to the migraphx...
The whole purpose of the op-builders is to factor out the main functionalities behind the current onnx parsers and put them in a separate set of files (under the folder...
## Motivation MLIR doesnt support fp32 GEG fusion on navi. ## Technical Details Disable GEG fusion for fp32, and enable GEG in jenkins CI. ## Changelog Category - - [...
## Motivation * Part of https://github.com/ROCm/AMDMIGraphX-internal/issues/149 ## Technical Details * Requires https://github.com/ROCm/rocMLIR/tree/packFp4 from rocMLIR to work on MI350. * This will pass CI since CI doesn't run on a MI350...
## Motivation disable matchers by default for dynamic shape graphs ## Technical Details Updates based on comments in #4347 Changes need to be applied on top of #4316 ## Changelog...
Implement flash decoding as described here: https://pytorch.org/blog/flash-decoding/ We have attention operators grouped like this: ``` Q -> [B, M, k] K -> [B, k, N] V -> [B, N, D]...
## Motivation ## Technical Details ## Changelog Category - - [ ] Added: New functionality. - - [ ] Changed: Changes to existing functionality. - - [ ] Removed: Functionality...
Updated image build process, and remove deprecated code. Added stages for checking and building Docker images, and organized test stages for various configurations. ## Motivation Testing to determine if this...
## Motivation The scale values could underflow or overflow. So, to avoid those cases clamping on both sides. ## Technical Details ## Changelog Category - - [ ] Added: New...
## Motivation * Introduce Float8E8M0 type within MIGraphX for better MXFP4 optimizations and to use hipblaslt mxfp4 kernels. ## Technical Details ## Changelog Category - - [ ] Added: New...