Iron-Bound

Results 14 comments of Iron-Bound

Seconding this ^ Mi300 and H100 are both battling at the moment, so would like to use my 7900xtx!

For the makefile check we could do a basic check from the gfx version, also avoids importing anything. Other option would be to call `rocminfo` or `clinfo` ``` PYTORCH_ROCM_ARCH :=...

I've had success with my 7900xtx /w bfloat16 acceleration, so I'd recommend we use that as a first target and the Asm here can also work untill hipblaslt is more...

Very cool work @cameronshinn I've sure people in the community will be able to use it! If you haven't seen it already we have a discord community studying all things...

Haven't seen any official statement, but people are working on it: https://github.com/zhuzilin/ring-flash-attention

You'd have to let it run for 11M-20M before you can really tell the status, see experiments here: https://wandb.ai/iron-bound/pufferlib/runs/sjwhhk4r?workspace=user-iron-bound

> Do you still get stuck in the lab with the new fast training script? It's much better now and a welcome surprise 😁 > Brings back more of the...

Let me just say thanks, this is a great work! In training (Intel 13th/7900/32gb) can report: System memory usage dropped 20-30%, GPU is up from 2% to 7% utilization and...

It's picking it up automatically. For details, I'm running a 13700k/32gb ram/7900 xtx system with the rocm 5.7 container.

> Then it doesn't like ether of my NVIDIA cards. It hasn't touched my cards with the old or new version. It should mention using cuda when starting the script,...