text-generation-inference Modifications run with GPU on AMD w/ ROCm

I'd like to put some effort into getting this to run on my RX6900XT, could you suggest what areas of this codebase I'd need to review/revise to get AMD / ROCm working?

Apr 21 '23 19:04 deftdawg

For all the flash attention models, I think you will not be able to make them work on AMD, but I think for all the other models it should be just an issue of getting the correct version of torch, right? I will ask colleagues at HF if they give you a better answer.

Apr 21 '23 19:04 OlivierDehaene

I've been messing around with @yk's text-generation-inference docker images as part of Open Assistant and I can get PyTorch to recognise my RX6900XT by passing in the devices in the docker-compose.yml

    devices:
      - /dev/dri:/dev/dri
      - /dev/kfd:/dev/kfd

and then forcing the uninstall of PyTorch, followed by the install of PyTorch w/ ROCm...

python -c "import torch; print('torch version: ', torch.__version__); print('Torch CUDA: ', torch.cuda.is_available());"
# open-assistant-inference-worker-1     | torch version:  2.0.0+cu118
# open-assistant-inference-worker-1     | Torch CUDA:  False

echo "Installing PyTorch + ROCm... "
pip uninstall -y torch

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2

python -c "import torch; print('torch version: ', torch.__version__); print('Torch CUDA: ', torch.cuda.is_available());"
# open-assistant-inference-worker-1     | torch version:  2.0.0+rocm5.4.2                                                                                                                                                                     
# open-assistant-inference-worker-1     | Torch CUDA:  True

That could be rolled upstream into the Dockerfile under the pytorch installer, if it doesn't break CUDA (I don't have a Nvidia CUDA device I could test it on).

Unfortunately, it looks like I'll also have to sort out if any of the ROCm forks of bitsandbytes can work to get past this error for quantization:

open-assistant-inference-worker-1     | 2023-04-22T00:32:22.420703Z ERROR text_generation_launcher: Shard 0 failed to start:                                                                                                                
open-assistant-inference-worker-1     | /opt/miniconda/envs/text-generation/lib/python3.9/site-packages/bitsandbytes/cextension.py:127: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizer
s and GPU quantization are unavailable.                                                                                                                                                                                                     
open-assistant-inference-worker-1     |   warn("The installed version of bitsandbytes was compiled without GPU support. "                                                                                                                   
open-assistant-inference-worker-1     | We're not using custom kernels.                                                                                                                                                                     
Loading checkpoint shards:   0%|          | 0/3 [00:02<?, ?it/s]

Apr 22 '23 04:04 deftdawg

my TGI images probably need updating, given how fast everything has been progressing. I suggest we either replace them by original HF TGI images (as they now have all the functionality we need) or you use the worker-hf container instead of the worker-full

Apr 22 '23 08:04 yk

I kind of new to this, but what are TGI images?

I presume there is still work to be done still by bitsandbytes to get ROCm working on this project?

May 20 '23 18:05 hazrpg

I kind of new to this, but what are TGI images?

We mean the docker images recommended to run this project (it makes everything smoother to use)

https://github.com/huggingface/text-generation-inference#get-started

May 22 '23 12:05 Narsil

@deftdawg Hi, I want know if you already have successfully depoly TGI in rocm devices?

Aug 16 '23 06:08 Daniel-NJ

Does TGI work on MI100?

Sep 16 '23 05:09 ehartford

IT's not a target a seems unlikely given the amount of cuda specific kernels, but I know barely anything about those right now.

Sep 19 '23 07:09 Narsil

Support has been added officially for Instinct MI210 and MI250: https://github.com/huggingface/text-generation-inference/pull/1243

Closing in favor of https://github.com/huggingface/text-generation-inference/issues/223

Nov 27 '23 13:11 fxmarty

text-generation-inference text-generation-inference copied to clipboard

Modifications run with GPU on AMD w/ ROCm

text-generation-inference
text-generation-inference copied to clipboard