tabby feat: Add Support for Devices other than CUDA in Telemetry and Web UI

Realized via AMD hipBLAS (ROCm/HIP)

Fixes #636

Nov 26 '23 18:11 cromefire

Okay, the GPU name is also in the UI now, it's more like the ID currently, but I've opened an issue for the library to get the marketing (user recognizable) name: PTFOPlayer/rocm_lib#2

Nov 26 '23 19:11 cromefire

Documentation is also there now, it should be pretty okay-ish for a start.

Nov 26 '23 20:11 cromefire

As for the why I use rocminfo instead of a lib that there's no rust wrapper for libhsa-runtime64 and I tried wring my own quickly, but due to C being C it seems to create just more problems than it solves (and my rust ffi and C skills aren't great). And then you have to link it again, which is rather fragile or load in on runtime, which is more of a guessing game, while rocminfo should be available whenever rocm is installed and is just on the path and just works. And if not, worst case, it doesn't show the GPUs in the health info. Maybe long term, at least on Linux, some sort of generic approach using the kernel interfaces is possible (via DRM or so).

Nov 28 '23 23:11 cromefire

Hey, considering the increasing size of this pull request, I have extracted #913 to focus on the basic ROCm support as a preliminary step

Nov 29 '23 03:11 wsxiaoys

The other PR didn't add any doc but I tried the command in documentation of this PR but couldn't get it to work, is there a trick to it I'm missing?

docker run -it
--device /dev/dri --device /dev/kfd
-p 8080:8080 -v $HOME/.tabby:/data
tabbyml/tabby-rocm
serve --model TabbyML/StarCoder-1B --device rocm

Dec 08 '23 15:12 Amzd

The other PR didn't add any doc but I tried the command in documentation of this PR but couldn't get it to work, is there a trick to it I'm missing?

docker run -it
--device /dev/dri --device /dev/kfd
-p 8080:8080 -v $HOME/.tabby:/data
tabbyml/tabby-rocm
serve --model TabbyML/StarCoder-1B --device rocm

The other PR didn't include the docker image, you'll have to build it yourself

Dec 08 '23 16:12 cromefire

I am sadly unable to run this docker:

$ docker-compose up 
Recreating tabby_tabby_1 ... done
Attaching to tabby_tabby_1
tabby_1  | 2023-12-09T17:02:11.296325Z  INFO tabby::serve: crates/tabby/src/serve.rs:105: Starting server, this might takes a few minutes...
tabby_1  | ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
tabby_1  | ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
tabby_1  | ggml_init_cublas: found 2 ROCm devices:
tabby_1  |   Device 0: AMD Radeon RX 6900 XT, compute capability 10.3
tabby_1  |   Device 1: AMD Radeon Graphics, compute capability 10.3
tabby_1  | ggml_cuda_set_main_device: using device 0 (AMD Radeon RX 6900 XT) as main device
tabby_1  | 
tabby_1  | CUDA error 98 at /root/workspace/crates/llama-cpp-bindings/llama.cpp/ggml-cuda.cu:6779: invalid device function
tabby_1  | current device: 0
tabby_tabby_1 exited with code 1

my docker-compose.yml:

version: '3.5'

services:
  tabby:
    build:
      context: .
      dockerfile: rocm.Dockerfile
    command: serve --model TabbyML/StarCoder-1B --device rocm
    volumes:
      - "$HOME/Docker/tabby/data:/data"
    ports:
      - 8229:8080
    environment:
      - TABBY_DISABLE_USAGE_COLLECTION=1
    devices:
      - /dev/dri
      - /dev/kfd

Dec 09 '23 17:12 Frozen-byte

You'll probably need to remove your iGPU from the container, ROCm sometimes has some problems with that (alternatively I think in the llama.cpp docs it also called out an environment variable to make it only use a specific GPU).

Dec 09 '23 17:12 cromefire

So updating this from main starts to get complicated now, because there's conflicts in a lot of places, so it would be great to get this in like some time soon... I'm trying to fix it up now, but I'm not sure how long I can do this, it's quite time consuming.

Dec 10 '23 12:12 cromefire

You'll probably need to remove your iGPU from the container, ROCm sometimes has some problems with that (alternatively I think in the llama.cpp docs it also called out an environment variable to make it only use a specific GPU).

Thank you! I can verify that pinning devices works perfectly now (1bc6f4801e2583c13ba81e69b2cffa7702fa8a89):

    devices:
      - /dev/dri/card0
      - /dev/dri/renderD128
      - /dev/kfd

Dec 10 '23 13:12 Frozen-byte

Also the llama.cpp fork should probably be updated sooner than later, as there's some fixes for AMD GPU performance in there.

Dec 10 '23 14:12 cromefire

Hey @cromefire - I would suggest first have a PR of building docker image / binary checkin first, then we can review how to integrate to rocm device information into dashboard.

Dec 10 '23 14:12 wsxiaoys

Sure. I'll extract that. The changes to support more accelerators are the hardest to maintain though.

PS: Also keep in mind that it won't only be ROCm device info soon, but probably oneAPI and/or Vulkan as well, which is why I made this be a more generic Accelerator interface.

Dec 10 '23 14:12 cromefire

Just removed the conflicts and maybe another friendly reminder that these changes suck up a lot of time the longer they lie around and I constantly have to resolve the conflicts...

Dec 22 '23 13:12 cromefire

super nice, when will the PR be merged?

Jan 05 '24 14:01 JumpLink

tabby tabby copied to clipboard

feat: Add Support for Devices other than CUDA in Telemetry and Web UI

tabby
tabby copied to clipboard