tabby
tabby copied to clipboard
feat: Add Support for Devices other than CUDA in Telemetry and Web UI
Realized via AMD hipBLAS (ROCm/HIP)
Fixes #636
Okay, the GPU name is also in the UI now, it's more like the ID currently, but I've opened an issue for the library to get the marketing (user recognizable) name: PTFOPlayer/rocm_lib#2
Documentation is also there now, it should be pretty okay-ish for a start.
As for the why I use rocminfo instead of a lib that there's no rust wrapper for libhsa-runtime64 and I tried wring my own quickly, but due to C being C it seems to create just more problems than it solves (and my rust ffi and C skills aren't great). And then you have to link it again, which is rather fragile or load in on runtime, which is more of a guessing game, while rocminfo should be available whenever rocm is installed and is just on the path and just works. And if not, worst case, it doesn't show the GPUs in the health info. Maybe long term, at least on Linux, some sort of generic approach using the kernel interfaces is possible (via DRM or so).
Hey, considering the increasing size of this pull request, I have extracted #913 to focus on the basic ROCm support as a preliminary step
The other PR didn't add any doc but I tried the command in documentation of this PR but couldn't get it to work, is there a trick to it I'm missing?
docker run -it
--device /dev/dri --device /dev/kfd
-p 8080:8080 -v $HOME/.tabby:/data
tabbyml/tabby-rocm
serve --model TabbyML/StarCoder-1B --device rocm
The other PR didn't add any doc but I tried the command in documentation of this PR but couldn't get it to work, is there a trick to it I'm missing?
docker run -it
--device /dev/dri --device /dev/kfd
-p 8080:8080 -v $HOME/.tabby:/data
tabbyml/tabby-rocm
serve --model TabbyML/StarCoder-1B --device rocm
The other PR didn't include the docker image, you'll have to build it yourself
I am sadly unable to run this docker:
$ docker-compose up
Recreating tabby_tabby_1 ... done
Attaching to tabby_tabby_1
tabby_1 | 2023-12-09T17:02:11.296325Z INFO tabby::serve: crates/tabby/src/serve.rs:105: Starting server, this might takes a few minutes...
tabby_1 | ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
tabby_1 | ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
tabby_1 | ggml_init_cublas: found 2 ROCm devices:
tabby_1 | Device 0: AMD Radeon RX 6900 XT, compute capability 10.3
tabby_1 | Device 1: AMD Radeon Graphics, compute capability 10.3
tabby_1 | ggml_cuda_set_main_device: using device 0 (AMD Radeon RX 6900 XT) as main device
tabby_1 |
tabby_1 | CUDA error 98 at /root/workspace/crates/llama-cpp-bindings/llama.cpp/ggml-cuda.cu:6779: invalid device function
tabby_1 | current device: 0
tabby_tabby_1 exited with code 1
my docker-compose.yml:
version: '3.5'
services:
tabby:
build:
context: .
dockerfile: rocm.Dockerfile
command: serve --model TabbyML/StarCoder-1B --device rocm
volumes:
- "$HOME/Docker/tabby/data:/data"
ports:
- 8229:8080
environment:
- TABBY_DISABLE_USAGE_COLLECTION=1
devices:
- /dev/dri
- /dev/kfd
You'll probably need to remove your iGPU from the container, ROCm sometimes has some problems with that (alternatively I think in the llama.cpp docs it also called out an environment variable to make it only use a specific GPU).
So updating this from main starts to get complicated now, because there's conflicts in a lot of places, so it would be great to get this in like some time soon... I'm trying to fix it up now, but I'm not sure how long I can do this, it's quite time consuming.
You'll probably need to remove your iGPU from the container, ROCm sometimes has some problems with that (alternatively I think in the llama.cpp docs it also called out an environment variable to make it only use a specific GPU).
Thank you! I can verify that pinning devices works perfectly now (1bc6f4801e2583c13ba81e69b2cffa7702fa8a89):
devices:
- /dev/dri/card0
- /dev/dri/renderD128
- /dev/kfd
Also the llama.cpp fork should probably be updated sooner than later, as there's some fixes for AMD GPU performance in there.
Hey @cromefire - I would suggest first have a PR of building docker image / binary checkin first, then we can review how to integrate to rocm device information into dashboard.
Sure. I'll extract that. The changes to support more accelerators are the hardest to maintain though.
PS: Also keep in mind that it won't only be ROCm device info soon, but probably oneAPI and/or Vulkan as well, which is why I made this be a more generic Accelerator interface.
Just removed the conflicts and maybe another friendly reminder that these changes suck up a lot of time the longer they lie around and I constantly have to resolve the conflicts...
super nice, when will the PR be merged?