Iron-Bound

Results 5 comments of Iron-Bound

Seconding this ^ Mi300 and H100 are both battling at the moment, so would like to use my 7900xtx!

For the makefile check we could do a basic check from the gfx version, also avoids importing anything. Other option would be to call `rocminfo` or `clinfo` ``` PYTORCH_ROCM_ARCH :=...

I've had success with my 7900xtx /w bfloat16 acceleration, so I'd recommend we use that as a first target and the Asm here can also work untill hipblaslt is more...

Very cool work @cameronshinn I've sure people in the community will be able to use it! If you haven't seen it already we have a discord community studying all things...

Haven't seen any official statement, but people are working on it: https://github.com/zhuzilin/ring-flash-attention