AMD ROCm Support
Adding AMD ROCm Support throughout Transformer Lab app
Notes
- Need to make vLLM work because the Ray error blocks AMD
- Nanotron requires flash-attn and currently flash-attn build breaks for us, need to check because the error is vague.
- Unsloth GRPO Trainer won't work because it uses bitsandbytes
- Multi GPU Trainers which work with Accelerate do mixed precision by default which doesn't work well when device is AMD, need to figure this out.
Current blockers:
- WSL does not support rocm-smi which makes usage tracking difficult.
- pytorch 2.7+rocm6.3 binary doesn't work on WSL as it doesn't detect
torch.cuda.is_available()even thoughtorch.version.hipis available
Just sharing this link in case it has information that might help resolve the issue, I don't have enough knowledge to determine whether it will help or not.
https://www.reddit.com/r/LocalLLaMA/comments/1kp6gdv/rocm_64_current_unsloth_working/
Thanks for the link @charmandercha! I was able to resolve this with some advice from the issue here. Unfortunately we wouldn't be able to track GPU usage currently as rocm-smi doesn't work on WSL with AMD GPU but we still did get everything else running. Just waiting on a couple of smoke tests now
Thanks for the feedback, I'll take a closer look at some point, then I'll see if I can get the installation right on Linux as I don't use Windows.
To anyone tracking this:
We now support AMD GPUs - install instructions here :) Please provide some feedback and we'd be happy to keep improving on this!
(cc @charmandercha)
To anyone tracking this:
We now support AMD GPUs - install instructions here :) Please provide some feedback and we'd be happy to keep improving on this!
(cc @charmandercha)
Thank you! I read the docs and the transformer lab say it's not have support for flash attention in my machine.
It seems to find a GPU but doesn't bring any information about the memory capacity for example.
GPU:✅ CUDA Version:n/a Python MPS:❌ Flash Attention:❌ Flash Attn Version:n/a
Is there something wrong with my installation or has the interface not yet received updates and that's why it still doesn't show information about my card (it's okay if that's the case, I just want to understand what's going on)?
Is there anything specific I should test for you?
I'm a Linux user, I use Pop OS "Pop!_OS 22.04 LTS"
Rocm-smi package installation seems ok
rocminfo: ROCk module is loaded
Hi, We recommend using this on native Ubuntu (22.04 or 24.04 for now since thats been tested). Pop OS has been having some issues with the newer rocm bare-metal installations (ref)
I see, this means that the entire interface should already be showing information about flashattention, AMD GPU name and everything else, right?
I just wanted to confirm because if the interface is already fully configured I can try to resolve it over the weekend and then I would know through visual feedback that things are working properly.
Except flash-attn, everything should be shown! We got rid of flash-attn so that not installed message can be discarded and we’ll remove it soon. We use pyrsmi so incase you’d like to debug your system to see if something is detected you can also make use of that package. The interface would work as well if everything is in order
Could you answer a question for me?
You are going to remove flash-attention, wasn't this software something to speed up the process? Is it already outdated?
It ran on pop os, I just had to remove all the folders from the transformer-lab installation that I had and now even the AMD icon is being rendered.
I ran it on pop_os, in the instructions you said that if you ran two specific packages, transformer-lab would probably run, they were working so I removed everything that was there before and redid the installation.
I installed rocm on my machine using AMD's official instructions but it's been a while, I remember the first time I got an error was because I selected the 24.04 tab instead of the 22.04
So i use this code:
wget https://repo.radeon.com/amdgpu-install/6.4.1/ubuntu/jammy/amdgpu-install_6.4.60401-1_all.deb sudo apt install ./amdgpu-install_6.4.60401-1_all.deb sudo apt update sudo apt install python3-setuptools python3-wheel sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups sudo apt install rocm
If there's anything in my terminal history that you want to see I can try to find it using "history" but I just can't give a lot of details now because it's been a while since I did this configuration.
But if there is something I can and can help with, I can try.
It ran on pop os, I just had to remove all the folders from the transformer-lab installation that I had and now even the AMD icon is being rendered.
I ran it on pop_os, in the instructions you said that if you ran two specific packages, transformer-lab would probably run, they were working so I removed everything that was there before and redid the installation.
I installed rocm on my machine using AMD's official instructions but it's been a while, I remember the first time I got an error was because I selected the 24.04 tab instead of the 22.04
So i use this code:
wget https://repo.radeon.com/amdgpu-install/6.4.1/ubuntu/jammy/amdgpu-install_6.4.60401-1_all.deb sudo apt install ./amdgpu-install_6.4.60401-1_all.deb sudo apt update sudo apt install python3-setuptools python3-wheel sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups sudo apt install rocm
If there's anything in my terminal history that you want to see I can try to find it using "history" but I just can't give a lot of details now because it's been a while since I did this configuration.
But if there is something I can and can help with, I can try.
Glad you got it working! Please let me know if there are other issues!
AMD is working now, closing this. Please re-open if there are any issues!