AMDMIGraphX
AMDMIGraphX copied to clipboard
AMD's graph optimization engine.
Pick a kernel, and collect runtime as a function of repetitions. Set warm up to zero. The goal is to figure out when exactly the GPU clicks throttle down...
- Should be reviewed after https://github.com/ROCm/AMDMIGraphX/pull/3866 is merged. - Partially resolves https://github.com/migraphx-benchmark/AMDMIGraphX/issues/200 - Adds GPU support for SparseAttention using simple handmade kernels, will be replaced by an implementation that uses...
This will tune the block size and algorithm chosen. It also fixes the benchmarking.
Based off changes via- https://github.com/ROCm/AMDMIGraphX/pull/3865 Split these out from that branch
~~A few error messages are getting piped to `cerr`, while most go to `cout`. Thus, making it consistent.~~ Channeling warnings that stream to cout, instead into cerr. (Ideally the verify...
This implements a faster GPU topk. * Update the ref version of topk to take a parameter for the indices, and also updated to handle any layout. * Added a...
Updates `find_splits` matcher: * Skip trying to fuse instructions that have inter-group dependencies (where a split group depends on another such as the case `sigmoid(x) + x` * Allow fusing...
SD21 seems to be running out of memory on higher batch sizes on Navi4x systems
QA tests for UNet on Ubuntu 24.04 are failing. Seems to be an issue with an improper pythonpath