Iron-Bound
Iron-Bound
Seconding this ^ Mi300 and H100 are both battling at the moment, so would like to use my 7900xtx!
For the makefile check we could do a basic check from the gfx version, also avoids importing anything. Other option would be to call `rocminfo` or `clinfo` ``` PYTORCH_ROCM_ARCH :=...
I've had success with my 7900xtx /w bfloat16 acceleration, so I'd recommend we use that as a first target and the Asm here can also work untill hipblaslt is more...
Very cool work @cameronshinn I've sure people in the community will be able to use it! If you haven't seen it already we have a discord community studying all things...
Haven't seen any official statement, but people are working on it: https://github.com/zhuzilin/ring-flash-attention
You'd have to let it run for 11M-20M before you can really tell the status, see experiments here: https://wandb.ai/iron-bound/pufferlib/runs/sjwhhk4r?workspace=user-iron-bound
> Do you still get stuck in the lab with the new fast training script? It's much better now and a welcome surprise 😁 > Brings back more of the...
Let me just say thanks, this is a great work! In training (Intel 13th/7900/32gb) can report: System memory usage dropped 20-30%, GPU is up from 2% to 7% utilization and...
It's picking it up automatically. For details, I'm running a 13700k/32gb ram/7900 xtx system with the rocm 5.7 container.
> Then it doesn't like ether of my NVIDIA cards. It hasn't touched my cards with the old or new version. It should mention using cuda when starting the script,...