fauxpilot RTX 4090 GPU is not yet supported in this version of the container

➜  fauxpilot git:(main) ./launch.sh
[+] Building 0.6s (16/16) FINISHED
 => [fauxpilot-copilot_proxy internal] load .dockerignore                                                                                                                              0.0s
 => => transferring context: 2B                                                                                                                                                        0.0s
 => [fauxpilot-copilot_proxy internal] load build definition from proxy.Dockerfile                                                                                                     0.0s
 => => transferring dockerfile: 307B                                                                                                                                                   0.0s
 => [fauxpilot-triton internal] load build definition from triton.Dockerfile                                                                                                           0.0s
 => => transferring dockerfile: 325B                                                                                                                                                   0.0s
 => [fauxpilot-triton internal] load .dockerignore                                                                                                                                     0.1s
 => => transferring context: 2B                                                                                                                                                        0.0s
 => [fauxpilot-copilot_proxy internal] load metadata for docker.io/library/python:3.10-slim-buster                                                                                     0.6s
 => [fauxpilot-triton internal] load metadata for docker.io/moyix/triton_with_ft:22.09                                                                                                 0.5s
 => [fauxpilot-triton 1/3] FROM docker.io/moyix/triton_with_ft:22.09@sha256:5a15c1f29c6b018967b49c588eb0ea67acbf897abb7f26e509ec21844574c9b1                                           0.0s
 => CACHED [fauxpilot-triton 2/3] RUN python3 -m pip install --disable-pip-version-check -U torch --extra-index-url https://download.pytorch.org/whl/cu116                             0.0s
 => CACHED [fauxpilot-triton 3/3] RUN python3 -m pip install --disable-pip-version-check -U transformers bitsandbytes accelerate                                                       0.0s
 => [fauxpilot-copilot_proxy] exporting to image                                                                                                                                       0.0s
 => => exporting layers                                                                                                                                                                0.0s
 => => writing image sha256:fe66a1b2589e744d9ede040faf3bbaf367c649f9933dd81614571c0dc5467588                                                                                           0.0s
 => => naming to docker.io/library/fauxpilot-triton                                                                                                                                    0.0s
 => => writing image sha256:7fbc7cbfd210262842109e585872d78dda92227fb399215904c45deea5629df3                                                                                           0.0s
 => => naming to docker.io/library/fauxpilot-copilot_proxy                                                                                                                             0.0s
 => [fauxpilot-copilot_proxy 1/5] FROM docker.io/library/python:3.10-slim-buster@sha256:520537d39498addbb048e847381108f52659330c0e13438cccb45311395cc870                               0.0s
 => [fauxpilot-copilot_proxy internal] load build context                                                                                                                              0.0s
 => => transferring context: 1.10kB                                                                                                                                                    0.0s
 => CACHED [fauxpilot-copilot_proxy 2/5] WORKDIR /python-docker                                                                                                                        0.0s
 => CACHED [fauxpilot-copilot_proxy 3/5] COPY copilot_proxy/requirements.txt requirements.txt                                                                                          0.0s
 => CACHED [fauxpilot-copilot_proxy 4/5] RUN pip3 install --no-cache-dir -r requirements.txt                                                                                           0.0s
 => CACHED [fauxpilot-copilot_proxy 5/5] COPY copilot_proxy .                                                                                                                          0.0s
[+] Running 2/0
 ⠿ Container fauxpilot-copilot_proxy-1  Created                                                                                                                                        0.0s
 ⠿ Container fauxpilot-triton-1         Created                                                                                                                                        0.0s
Attaching to fauxpilot-copilot_proxy-1, fauxpilot-triton-1
fauxpilot-triton-1         |
fauxpilot-triton-1         | =============================
fauxpilot-triton-1         | == Triton Inference Server ==
fauxpilot-triton-1         | =============================
fauxpilot-triton-1         |
fauxpilot-triton-1         | NVIDIA Release 22.06 (build 39726160)
fauxpilot-triton-1         | Triton Server Version 2.23.0
fauxpilot-triton-1         |
fauxpilot-triton-1         | Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
fauxpilot-triton-1         |
fauxpilot-triton-1         | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
fauxpilot-triton-1         |
fauxpilot-triton-1         | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
fauxpilot-triton-1         | By pulling and using the container, you accept the terms and conditions of this license:
fauxpilot-triton-1         | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
fauxpilot-triton-1         | WARNING: Detected NVIDIA NVIDIA GeForce RTX 4090 GPU, which is not yet supported in this version of the container
fauxpilot-triton-1         | ERROR: No supported GPU(s) detected to run this container
fauxpilot-triton-1         |
fauxpilot-copilot_proxy-1  | INFO:     Started server process [1]
fauxpilot-copilot_proxy-1  | INFO:     Waiting for application startup.
fauxpilot-copilot_proxy-1  | INFO:     Application startup complete.
fauxpilot-copilot_proxy-1  | INFO:     Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
fauxpilot-triton-1         | I0321 20:35:31.923611 88 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x204e00000' with size 268435456
fauxpilot-triton-1         | I0321 20:35:31.923725 88 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
fauxpilot-triton-1         | I0321 20:35:31.925317 88 model_repository_manager.cc:1191] loading: py-model:1
fauxpilot-triton-1         | I0321 20:35:32.028166 88 python_be.cc:1774] TRITONBACKEND_ModelInstanceInitialize: py-model_0 (CPU device 0)
fauxpilot-triton-1         | Cuda available? True
fauxpilot-triton-1         | is_half: True, int8: True, auto_device_map: True

How is rtx 4090 is not supported if there example of someone using it in the matrix?

Mar 21 '23 20:03 Sammers21

Hello there, thanks for opening your first issue. We welcome you to the FauxPilot community!

Mar 21 '23 20:03 github-actions[bot]

Yes – the current container is getting a bit old. There is a branch open right now to update to a newer version of Triton (and hence a newer version of CUDA) but it needs a bit more testing before we can merge.

Mar 26 '23 20:03 moyix

Same on RTX 4080, fails to launch:

 => [2/3] RUN python3 -m pip install --disable-pip-version-check -U torch --extra-index-url https://download.pytorch.org/whl/cu116                       188.7s
 => [3/3] RUN python3 -m pip install --disable-pip-version-check -U transformers bitsandbytes accelerate                                                  11.9s
 => exporting to image                                                                                                                                    21.8s
 => => exporting layers                                                                                                                                   21.8s
 => => writing image sha256:62d84f9ae8d8972163cc9f4af7b36307d059352727fe283dc4e8f5aee7ebfd4e                                                               0.0s
 => => naming to docker.io/library/fauxpilot-triton                                                                                                        0.0s
[+] Running 3/3
 ✔ Network fauxpilot_default            Created                                                                                                            0.1s 
 ✔ Container fauxpilot-copilot_proxy-1  Created                                                                                                            0.1s 
 ✔ Container fauxpilot-triton-1         Created                                                                                                            0.1s 
Attaching to fauxpilot-copilot_proxy-1, fauxpilot-triton-1
Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]

edit: was missing nvidia-docker.

Apr 07 '23 09:04 vadi2

I get the same (RTX 4080) however it does actually work:

fauxpilot-triton-1         | =============================
fauxpilot-triton-1         | == Triton Inference Server ==
fauxpilot-triton-1         | =============================
fauxpilot-triton-1         | 
fauxpilot-triton-1         | NVIDIA Release 22.06 (build 39726160)
fauxpilot-triton-1         | Triton Server Version 2.23.0
fauxpilot-triton-1         | 
fauxpilot-triton-1         | Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
fauxpilot-triton-1         | 
fauxpilot-triton-1         | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
fauxpilot-triton-1         | 
fauxpilot-triton-1         | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
fauxpilot-triton-1         | By pulling and using the container, you accept the terms and conditions of this license:
fauxpilot-triton-1         | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
fauxpilot-triton-1         | WARNING: Detected NVIDIA NVIDIA GeForce RTX 4080 GPU, which is not yet supported in this version of the container
fauxpilot-triton-1         | ERROR: No supported GPU(s) detected to run this container
fauxpilot-triton-1         | 
fauxpilot-copilot_proxy-1  | INFO:     Started server process [1]
fauxpilot-copilot_proxy-1  | INFO:     Waiting for application startup.
fauxpilot-copilot_proxy-1  | INFO:     Application startup complete.
fauxpilot-copilot_proxy-1  | INFO:     Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
fauxpilot-triton-1         | I0407 09:10:42.518966 88 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f34f8000000' with size 268435456
fauxpilot-triton-1         | I0407 09:10:42.519269 88 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
fauxpilot-triton-1         | I0407 09:10:42.531040 88 model_repository_manager.cc:1191] loading: fastertransformer:1
fauxpilot-triton-1         | I0407 09:10:42.739821 88 libfastertransformer.cc:1226] TRITONBACKEND_Initialize: fastertransformer
fauxpilot-triton-1         | I0407 09:10:42.739838 88 libfastertransformer.cc:1236] Triton TRITONBACKEND API version: 1.10
fauxpilot-triton-1         | I0407 09:10:42.739842 88 libfastertransformer.cc:1242] 'fastertransformer' TRITONBACKEND API version: 1.10
fauxpilot-triton-1         | I0407 09:10:42.739875 88 libfastertransformer.cc:1274] TRITONBACKEND_ModelInitialize: fastertransformer (version 1)
// ...
fauxpilot-triton-1         | I0407 09:10:42.911052 88 libfastertransformer.cc:307] Before Loading Model:
fauxpilot-triton-1         | after allocation, free 14.62 GB total 15.70 GB
fauxpilot-triton-1         | [WARNING] gemm_config.in is not found; using default GEMM algo
fauxpilot-triton-1         | I0407 09:11:18.353777 88 libfastertransformer.cc:321] After Loading Model:
fauxpilot-triton-1         | after allocation, free 1.10 GB total 15.70 GB
fauxpilot-triton-1         | I0407 09:11:18.355366 88 libfastertransformer.cc:537] Model instance is created on GPU NVIDIA GeForce RTX 4080
fauxpilot-triton-1         | I0407 09:11:18.358094 88 model_repository_manager.cc:1345] successfully loaded 'fastertransformer' version 1
fauxpilot-triton-1         | I0407 09:11:18.358381 88 server.cc:556]

// curl -s -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"prompt":"def hello","max_tokens":100,"temperature":0.1,"stop":["\n\n"]}' http://localhost:5000/v1/engines/codegen/completions | jq
{
  "id": "cmpl-h1jvz164uehkbl0tufObee0ZFUDqe",
  "model": "codegen",
  "object": "text_completion",
  "created": 1680859410,
  "choices": [
    {
      "text": "(self):\n        return \"Hello World\"",
      "index": 0,
      "finish_reason": "stop",
      "logprobs": null
    }
  ],
  "usage": {
    "completion_tokens": 11,
    "prompt_tokens": 2,
    "total_tokens": 13
  }
}

That said cudnn 8.8 has major improvements for the 40-series cards, would be nice to get that library updated.

Apr 07 '23 09:04 vadi2

Getting this with a 4070 as well.

[triton]        |
[triton]        | =============================
[triton]        | == Triton Inference Server ==
[triton]        | =============================
[triton]        |
[triton]        | NVIDIA Release 22.06 (build 39726160)
[triton]        | Triton Server Version 2.23.0
[triton]        |
[triton]        | Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
[triton]        |
[triton]        | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
[triton]        |
[triton]        | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
[triton]        | By pulling and using the container, you accept the terms and conditions of this license:
[triton]        | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
[triton]        | WARNING: Detected NVIDIA NVIDIA GeForce RTX 4070 Laptop GPU GPU, which is not yet supported in this version of the container
[triton]        | ERROR: No supported GPU(s) detected to run this container
[triton]        |

Jan 31 '24 21:01 iameli

Where is moyix/triton_with_ft:22.09 built from? I see it on Docker Hub but I can't find the Dockerfile anywhere to try updating stuff.

Jan 31 '24 21:01 iameli