text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Modifications run with GPU on AMD w/ ROCm

Open deftdawg opened this issue 2 years ago • 5 comments

I'd like to put some effort into getting this to run on my RX6900XT, could you suggest what areas of this codebase I'd need to review/revise to get AMD / ROCm working?

deftdawg avatar Apr 21 '23 19:04 deftdawg

For all the flash attention models, I think you will not be able to make them work on AMD, but I think for all the other models it should be just an issue of getting the correct version of torch, right? I will ask colleagues at HF if they give you a better answer.

OlivierDehaene avatar Apr 21 '23 19:04 OlivierDehaene

I've been messing around with @yk's text-generation-inference docker images as part of Open Assistant and I can get PyTorch to recognise my RX6900XT by passing in the devices in the docker-compose.yml

    devices:
      - /dev/dri:/dev/dri
      - /dev/kfd:/dev/kfd

and then forcing the uninstall of PyTorch, followed by the install of PyTorch w/ ROCm...

python -c "import torch; print('torch version: ', torch.__version__); print('Torch CUDA: ', torch.cuda.is_available());"
# open-assistant-inference-worker-1     | torch version:  2.0.0+cu118
# open-assistant-inference-worker-1     | Torch CUDA:  False

echo "Installing PyTorch + ROCm... "
pip uninstall -y torch

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2

python -c "import torch; print('torch version: ', torch.__version__); print('Torch CUDA: ', torch.cuda.is_available());"
# open-assistant-inference-worker-1     | torch version:  2.0.0+rocm5.4.2                                                                                                                                                                     
# open-assistant-inference-worker-1     | Torch CUDA:  True                                    

That could be rolled upstream into the Dockerfile under the pytorch installer, if it doesn't break CUDA (I don't have a Nvidia CUDA device I could test it on).

Unfortunately, it looks like I'll also have to sort out if any of the ROCm forks of bitsandbytes can work to get past this error for quantization:

open-assistant-inference-worker-1     | 2023-04-22T00:32:22.420703Z ERROR text_generation_launcher: Shard 0 failed to start:                                                                                                                
open-assistant-inference-worker-1     | /opt/miniconda/envs/text-generation/lib/python3.9/site-packages/bitsandbytes/cextension.py:127: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizer
s and GPU quantization are unavailable.                                                                                                                                                                                                     
open-assistant-inference-worker-1     |   warn("The installed version of bitsandbytes was compiled without GPU support. "                                                                                                                   
open-assistant-inference-worker-1     | We're not using custom kernels.                                                                                                                                                                     
Loading checkpoint shards:   0%|          | 0/3 [00:02<?, ?it/s]                               

deftdawg avatar Apr 22 '23 04:04 deftdawg

my TGI images probably need updating, given how fast everything has been progressing. I suggest we either replace them by original HF TGI images (as they now have all the functionality we need) or you use the worker-hf container instead of the worker-full

yk avatar Apr 22 '23 08:04 yk

I kind of new to this, but what are TGI images?

I presume there is still work to be done still by bitsandbytes to get ROCm working on this project?

hazrpg avatar May 20 '23 18:05 hazrpg

I kind of new to this, but what are TGI images?

We mean the docker images recommended to run this project (it makes everything smoother to use)

https://github.com/huggingface/text-generation-inference#get-started

Narsil avatar May 22 '23 12:05 Narsil

@deftdawg Hi, I want know if you already have successfully depoly TGI in rocm devices?

Daniel-NJ avatar Aug 16 '23 06:08 Daniel-NJ

Does TGI work on MI100?

ehartford avatar Sep 16 '23 05:09 ehartford

IT's not a target a seems unlikely given the amount of cuda specific kernels, but I know barely anything about those right now.

Narsil avatar Sep 19 '23 07:09 Narsil

Support has been added officially for Instinct MI210 and MI250: https://github.com/huggingface/text-generation-inference/pull/1243

Closing in favor of https://github.com/huggingface/text-generation-inference/issues/223

fxmarty avatar Nov 27 '23 13:11 fxmarty