llama icon indicating copy to clipboard operation
llama copied to clipboard

AMD GPU's

Open NotNic182 opened this issue 1 year ago • 5 comments

So are people with AMD GPU's screwed? I literally just sold my nvidia card and a Radeon two days ago. I've been trying my hardest to get this damn thing to run, but no matter what I try on Windows, or Linux (xubuntu to be more specific) it always seems to come back to a cuda issue. SO before I waste more of my time trying desperately to make this work, is there any tools that will allow an AMD card to be used, or how do I bypass it and just run it off my CPU? Any help would be great.

some more specs of mine just in case Ryzen 5 5600 Radeon 6500 32 GB Ram

NotNic182 avatar Mar 04 '23 21:03 NotNic182

Check out the library: torch_directml

DirectML is a Windows library that should support AMD as well as NVidia on Windows.

It looks like there might be a bit of work converting it to using DirectML instead of CUDA. Specifically the parallel library doesn't look like it supports DirectML, so this might have to be ripped out and just be satisfied with running this on a single GPU.

Another way is if someone converted the model to Onnx and used Onnxruntime with the DirectML provider.

In conclusion, there are a variety of ways to get it to work but they require some coding. I hope someone will make a fork to support DirectML as I'm not sure quite how to get it right at the moment.

(BTW yes there are also cpu only forks but that seems a waste of your graphics cards!)

elephantpanda avatar Mar 04 '23 23:03 elephantpanda

I even checked the increase in vram usage, but I couldn't even check the text generation because the vram on my graphics card is 8GB, which is not suitable for running this. https://github.com/lshqqytiger/llama-directml

lshqqytiger avatar Mar 05 '23 04:03 lshqqytiger

So are people with AMD GPU's screwed? I literally just sold my nvidia card and a Radeon two days ago. I've been trying my hardest to get this damn thing to run, but no matter what I try on Windows, or Linux (xubuntu to be more specific) it always seems to come back to a cuda issue. SO before I waste more of my time trying desperately to make this work, is there any tools that will allow an AMD card to be used, or how do I bypass it and just run it off my CPU? Any help would be great.

some more specs of mine just in case Ryzen 5 5600 Radeon 6500 32 GB Ram

works perfectly fine for me on linux with a 6900xt, https://github.com/oobabooga/text-generation-webui also makes it really easy

Titaniumtown avatar Mar 05 '23 08:03 Titaniumtown

If you can't get LLaMA to work, try this: https://youtu.be/Bj4erD5NNa0

GrahamboJangles avatar Mar 12 '23 18:03 GrahamboJangles

check this TORCH_COMMAND='pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.1.1' python launch.py --precision full --no-half rocm - is compiler "cuda to opencl", for amd gpu maybe it is working, or maybe it simple way to run llama on amd gpu

0x131315 avatar Apr 11 '23 09:04 0x131315

cc @jeffdaily for viz. Closing this issue but it would be great to have more programmatic AMD support for Llama 1/2.

jspisak avatar Sep 06 '23 17:09 jspisak