text-generation-inference
text-generation-inference copied to clipboard
Modifications run with GPU on AMD w/ ROCm
I'd like to put some effort into getting this to run on my RX6900XT, could you suggest what areas of this codebase I'd need to review/revise to get AMD / ROCm working?
For all the flash attention models, I think you will not be able to make them work on AMD, but I think for all the other models it should be just an issue of getting the correct version of torch, right? I will ask colleagues at HF if they give you a better answer.
I've been messing around with @yk's text-generation-inference docker images as part of Open Assistant and I can get PyTorch to recognise my RX6900XT by passing in the devices in the docker-compose.yml
devices:
- /dev/dri:/dev/dri
- /dev/kfd:/dev/kfd
and then forcing the uninstall of PyTorch, followed by the install of PyTorch w/ ROCm...
python -c "import torch; print('torch version: ', torch.__version__); print('Torch CUDA: ', torch.cuda.is_available());"
# open-assistant-inference-worker-1 | torch version: 2.0.0+cu118
# open-assistant-inference-worker-1 | Torch CUDA: False
echo "Installing PyTorch + ROCm... "
pip uninstall -y torch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
python -c "import torch; print('torch version: ', torch.__version__); print('Torch CUDA: ', torch.cuda.is_available());"
# open-assistant-inference-worker-1 | torch version: 2.0.0+rocm5.4.2
# open-assistant-inference-worker-1 | Torch CUDA: True
That could be rolled upstream into the Dockerfile under the pytorch installer, if it doesn't break CUDA (I don't have a Nvidia CUDA device I could test it on).
Unfortunately, it looks like I'll also have to sort out if any of the ROCm forks of bitsandbytes can work to get past this error for quantization:
open-assistant-inference-worker-1 | 2023-04-22T00:32:22.420703Z ERROR text_generation_launcher: Shard 0 failed to start:
open-assistant-inference-worker-1 | /opt/miniconda/envs/text-generation/lib/python3.9/site-packages/bitsandbytes/cextension.py:127: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizer
s and GPU quantization are unavailable.
open-assistant-inference-worker-1 | warn("The installed version of bitsandbytes was compiled without GPU support. "
open-assistant-inference-worker-1 | We're not using custom kernels.
Loading checkpoint shards: 0%| | 0/3 [00:02<?, ?it/s]
my TGI images probably need updating, given how fast everything has been progressing. I suggest we either replace them by original HF TGI images (as they now have all the functionality we need) or you use the worker-hf container instead of the worker-full
I kind of new to this, but what are TGI images?
I presume there is still work to be done still by bitsandbytes to get ROCm working on this project?
I kind of new to this, but what are TGI images?
We mean the docker images recommended to run this project (it makes everything smoother to use)
https://github.com/huggingface/text-generation-inference#get-started
@deftdawg Hi, I want know if you already have successfully depoly TGI in rocm devices?
Does TGI work on MI100?
IT's not a target a seems unlikely given the amount of cuda specific kernels, but I know barely anything about those right now.
Support has been added officially for Instinct MI210 and MI250: https://github.com/huggingface/text-generation-inference/pull/1243
Closing in favor of https://github.com/huggingface/text-generation-inference/issues/223