transformerlab-app icon indicating copy to clipboard operation
transformerlab-app copied to clipboard

AMD ROCm Support

Open deep1401 opened this issue 7 months ago • 13 comments

Adding AMD ROCm Support throughout Transformer Lab app

deep1401 avatar May 01 '25 22:05 deep1401

Notes

  • Need to make vLLM work because the Ray error blocks AMD
  • Nanotron requires flash-attn and currently flash-attn build breaks for us, need to check because the error is vague.
  • Unsloth GRPO Trainer won't work because it uses bitsandbytes
  • Multi GPU Trainers which work with Accelerate do mixed precision by default which doesn't work well when device is AMD, need to figure this out.

deep1401 avatar May 01 '25 22:05 deep1401

Current blockers:

  • WSL does not support rocm-smi which makes usage tracking difficult.
  • pytorch 2.7+rocm6.3 binary doesn't work on WSL as it doesn't detect torch.cuda.is_available() even though torch.version.hip is available

deep1401 avatar May 16 '25 15:05 deep1401

Just sharing this link in case it has information that might help resolve the issue, I don't have enough knowledge to determine whether it will help or not.

https://www.reddit.com/r/LocalLLaMA/comments/1kp6gdv/rocm_64_current_unsloth_working/

charmandercha avatar May 18 '25 01:05 charmandercha

Thanks for the link @charmandercha! I was able to resolve this with some advice from the issue here. Unfortunately we wouldn't be able to track GPU usage currently as rocm-smi doesn't work on WSL with AMD GPU but we still did get everything else running. Just waiting on a couple of smoke tests now

deep1401 avatar May 20 '25 16:05 deep1401

Thanks for the feedback, I'll take a closer look at some point, then I'll see if I can get the installation right on Linux as I don't use Windows.

charmandercha avatar May 21 '25 12:05 charmandercha

To anyone tracking this:

We now support AMD GPUs - install instructions here :) Please provide some feedback and we'd be happy to keep improving on this!

(cc @charmandercha)

deep1401 avatar May 23 '25 20:05 deep1401

To anyone tracking this:

We now support AMD GPUs - install instructions here :) Please provide some feedback and we'd be happy to keep improving on this!

(cc @charmandercha)

Thank you! I read the docs and the transformer lab say it's not have support for flash attention in my machine.

It seems to find a GPU but doesn't bring any information about the memory capacity for example.

GPU:✅ CUDA Version:n/a Python MPS:❌ Flash Attention:❌ Flash Attn Version:n/a

Is there something wrong with my installation or has the interface not yet received updates and that's why it still doesn't show information about my card (it's okay if that's the case, I just want to understand what's going on)?

Is there anything specific I should test for you?

I'm a Linux user, I use Pop OS "Pop!_OS 22.04 LTS"

Rocm-smi package installation seems ok

rocminfo: ROCk module is loaded

charmandercha avatar May 23 '25 21:05 charmandercha

Hi, We recommend using this on native Ubuntu (22.04 or 24.04 for now since thats been tested). Pop OS has been having some issues with the newer rocm bare-metal installations (ref)

deep1401 avatar May 23 '25 21:05 deep1401

I see, this means that the entire interface should already be showing information about flashattention, AMD GPU name and everything else, right?

I just wanted to confirm because if the interface is already fully configured I can try to resolve it over the weekend and then I would know through visual feedback that things are working properly.

charmandercha avatar May 23 '25 21:05 charmandercha

Except flash-attn, everything should be shown! We got rid of flash-attn so that not installed message can be discarded and we’ll remove it soon. We use pyrsmi so incase you’d like to debug your system to see if something is detected you can also make use of that package. The interface would work as well if everything is in order

deep1401 avatar May 23 '25 22:05 deep1401

Could you answer a question for me?

You are going to remove flash-attention, wasn't this software something to speed up the process? Is it already outdated?

charmandercha avatar May 28 '25 10:05 charmandercha

It ran on pop os, I just had to remove all the folders from the transformer-lab installation that I had and now even the AMD icon is being rendered.

I ran it on pop_os, in the instructions you said that if you ran two specific packages, transformer-lab would probably run, they were working so I removed everything that was there before and redid the installation.

I installed rocm on my machine using AMD's official instructions but it's been a while, I remember the first time I got an error was because I selected the 24.04 tab instead of the 22.04

So i use this code:

wget https://repo.radeon.com/amdgpu-install/6.4.1/ubuntu/jammy/amdgpu-install_6.4.60401-1_all.deb sudo apt install ./amdgpu-install_6.4.60401-1_all.deb sudo apt update sudo apt install python3-setuptools python3-wheel sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups sudo apt install rocm

If there's anything in my terminal history that you want to see I can try to find it using "history" but I just can't give a lot of details now because it's been a while since I did this configuration.

But if there is something I can and can help with, I can try.

charmandercha avatar May 28 '25 10:05 charmandercha

It ran on pop os, I just had to remove all the folders from the transformer-lab installation that I had and now even the AMD icon is being rendered.

I ran it on pop_os, in the instructions you said that if you ran two specific packages, transformer-lab would probably run, they were working so I removed everything that was there before and redid the installation.

I installed rocm on my machine using AMD's official instructions but it's been a while, I remember the first time I got an error was because I selected the 24.04 tab instead of the 22.04

So i use this code:

wget https://repo.radeon.com/amdgpu-install/6.4.1/ubuntu/jammy/amdgpu-install_6.4.60401-1_all.deb sudo apt install ./amdgpu-install_6.4.60401-1_all.deb sudo apt update sudo apt install python3-setuptools python3-wheel sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups sudo apt install rocm

If there's anything in my terminal history that you want to see I can try to find it using "history" but I just can't give a lot of details now because it's been a while since I did this configuration.

But if there is something I can and can help with, I can try.

Glad you got it working! Please let me know if there are other issues!

deep1401 avatar May 28 '25 17:05 deep1401

AMD is working now, closing this. Please re-open if there are any issues!

deep1401 avatar Jun 26 '25 14:06 deep1401