FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

AMD Support

Open GrahamboJangles opened this issue 2 years ago • 8 comments

Does this work on AMD cards? What are the GPU requirements for inference?

GrahamboJangles avatar Apr 01 '23 02:04 GrahamboJangles

Please refer to https://github.com/lm-sys/FastChat#vicuna-weights and https://github.com/lm-sys/FastChat#serving. We only use PyTorch, so it should be easy to port to AMD if PyTorch supports AMD well.

merrymercy avatar Apr 04 '23 00:04 merrymercy

you can attempt to use rocm on linux but it's far from a smooth experience

Askejm avatar Apr 04 '23 09:04 Askejm

I tried it out on my RX 7900 XTX and it loaded the whole Vicuna 13B model in 8bit mode into VRAM - but segfaulted after loading the checkpoint shards.

(I'm guessing since the card isn't officially supported by ROCm yet 😅: https://github.com/RadeonOpenCompute/ROCm/issues/1973)

I also setup the ROCm + Vicuna development environment using a Nix flake, but there are a few more tweaks I want to make before publishing it (eg. writing Nix packages for accelerate and gradio).

kira-bruneau avatar Apr 15 '23 11:04 kira-bruneau

:tada: I managed to this running on my RX 7900 XTX!! I just tracked down all the development commits that added gfx11 support to ROCm built it all from source.

I pushed my Nix flake here if anyone wants to try it out themself https://github.com/kira-bruneau/FastChat/commit/75235dac0365e11157dbd950bc1a4cf528f8ddc6.

(I have it hard-coded to target gfx803 & gfx1100, so you might want to change that if you have a different AMD card: https://github.com/kira-bruneau/FastChat/commit/75235dac0365e11157dbd950bc1a4cf528f8ddc6#diff-206b9ce276ab5971a2489d75eb1b12999d4bf3843b7988cbe8d687cfde61dea0R24)

Steps:

  1. Install Nix
  2. Enable Nix flakes
  3. Load the development environment (this will build ROCm and PyTorch and will take multiple hours, so I recommend letting it run overnight):
nix develop github:kira-bruneau/FastChat/gfx1100
  1. Run the model:
python -m fastchat.serve.cli --model-path <path-to-model> --num-gpus 1 --load-8bit

asciicast

kira-bruneau avatar Apr 19 '23 14:04 kira-bruneau

I can confirm fastchat with vicuna 13b model runs fine with 8bit mode on a single AMD 6800 card. SYSTEM: Ubuntu 20 LTS, installed rocm 5.4.2, then pytorch with rocm 5.4.2 support. No need to build from source, works directly with all official packages.

Gaolaboratory avatar Apr 23 '23 05:04 Gaolaboratory

Oh yep sorry, for all other supported AMD cards you shouldn't need to build from source. I only had to because the RX 7900 XTX isn't supported in the release builds of ROCm yet.

kira-bruneau avatar Apr 25 '23 14:04 kira-bruneau

@kira-bruneau Is it still necessary for rx570(gfx803) to build from source?

aseok avatar Jun 01 '23 15:06 aseok

@aseok Oh nope! It was only necessary before the 5.5 ROCm release to support gfx1100.

Although... there are still some problems in nixpkgs that means there are parts that you still have to compile from source if you want to use the flake: (see https://github.com/NixOS/nixpkgs/pull/230881) - right now the builder fails to cache rocfft, so you'd have to compile pytorch from source still :disappointed:.

If you want to avoid building from source completely, I'd recommend using the official pytorch releases: https://pytorch.org, or try to find a docker image setup for it (which would be a little bit more involved). Hopefully the fixes will get upstreamed soon though!

kira-bruneau avatar Jun 01 '23 15:06 kira-bruneau

(@kira-bruneau) Can someone create instructions to install + run with ROCm please? It seems that there is no flag for it to run in ROCm mode.

fubuki4649 avatar Jun 18 '23 20:06 fubuki4649

Perhaps there should be a note in the README about AMD compatibility?

I successfully reproduced @Gaolaboratory's results. I managed to run Vicuna 13B on my RX 6800 XT with the --load-8bit option, using some packages from my OS (Fedora 38).

@onyasumi

Can someone create instructions to install + run with ROCm please?

  1. Install the ROCm package for your OS. Fedora 38 example below:

     sudo dnf install rocm-opencl rocm-opencl-devel
    
  2. Follow PyTorch's instructions to get the ROCm version of PyTorch. Remember to verify that PyTorch detects your graphics card.

    Side note: as of writing, Fedora 38 installs ROCm 5.5.1 and PyTorch goes up to 5.4.2, but it still worked for me.

  3. Install FastChat according to the README.

  4. Download a model and try to use it in CLI.

It seems that there is no flag for it to run in ROCm mode.

If PyTorch isn't installed, running pip install fschat will try to install it. If you specifically install the ROCm version of PyTorch beforehand, pip will notice that and skip installing PyTorch during the FastChat install. As a result, FastChat will just use ROCm through your install of PyTorch.

JonLiuFYI avatar Jun 29 '23 20:06 JonLiuFYI

@JonLiuFYI could you contribute a pull request to add some notes about AMD?

merrymercy avatar Jul 05 '23 11:07 merrymercy

@JonLiuFYI Thank you, this seemed to work for me. I can add a PR later to document this in the README

fubuki4649 avatar Jul 05 '23 19:07 fubuki4649

@JonLiuFYI @onyasumi please go ahead. Thanks!

merrymercy avatar Jul 06 '23 02:07 merrymercy