Iron-Bound
Iron-Bound
Seconding this ^ Mi300 and H100 are both battling at the moment, so would like to use my 7900xtx!
For the makefile check we could do a basic check from the gfx version, also avoids importing anything. Other option would be to call `rocminfo` or `clinfo` ``` PYTORCH_ROCM_ARCH :=...
I've had success with my 7900xtx /w bfloat16 acceleration, so I'd recommend we use that as a first target and the Asm here can also work untill hipblaslt is more...
Very cool work @cameronshinn I've sure people in the community will be able to use it! If you haven't seen it already we have a discord community studying all things...
Haven't seen any official statement, but people are working on it: https://github.com/zhuzilin/ring-flash-attention