RTX 4090 GPU is not yet supported in this version of the container
➜ fauxpilot git:(main) ./launch.sh
[+] Building 0.6s (16/16) FINISHED
=> [fauxpilot-copilot_proxy internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [fauxpilot-copilot_proxy internal] load build definition from proxy.Dockerfile 0.0s
=> => transferring dockerfile: 307B 0.0s
=> [fauxpilot-triton internal] load build definition from triton.Dockerfile 0.0s
=> => transferring dockerfile: 325B 0.0s
=> [fauxpilot-triton internal] load .dockerignore 0.1s
=> => transferring context: 2B 0.0s
=> [fauxpilot-copilot_proxy internal] load metadata for docker.io/library/python:3.10-slim-buster 0.6s
=> [fauxpilot-triton internal] load metadata for docker.io/moyix/triton_with_ft:22.09 0.5s
=> [fauxpilot-triton 1/3] FROM docker.io/moyix/triton_with_ft:22.09@sha256:5a15c1f29c6b018967b49c588eb0ea67acbf897abb7f26e509ec21844574c9b1 0.0s
=> CACHED [fauxpilot-triton 2/3] RUN python3 -m pip install --disable-pip-version-check -U torch --extra-index-url https://download.pytorch.org/whl/cu116 0.0s
=> CACHED [fauxpilot-triton 3/3] RUN python3 -m pip install --disable-pip-version-check -U transformers bitsandbytes accelerate 0.0s
=> [fauxpilot-copilot_proxy] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:fe66a1b2589e744d9ede040faf3bbaf367c649f9933dd81614571c0dc5467588 0.0s
=> => naming to docker.io/library/fauxpilot-triton 0.0s
=> => writing image sha256:7fbc7cbfd210262842109e585872d78dda92227fb399215904c45deea5629df3 0.0s
=> => naming to docker.io/library/fauxpilot-copilot_proxy 0.0s
=> [fauxpilot-copilot_proxy 1/5] FROM docker.io/library/python:3.10-slim-buster@sha256:520537d39498addbb048e847381108f52659330c0e13438cccb45311395cc870 0.0s
=> [fauxpilot-copilot_proxy internal] load build context 0.0s
=> => transferring context: 1.10kB 0.0s
=> CACHED [fauxpilot-copilot_proxy 2/5] WORKDIR /python-docker 0.0s
=> CACHED [fauxpilot-copilot_proxy 3/5] COPY copilot_proxy/requirements.txt requirements.txt 0.0s
=> CACHED [fauxpilot-copilot_proxy 4/5] RUN pip3 install --no-cache-dir -r requirements.txt 0.0s
=> CACHED [fauxpilot-copilot_proxy 5/5] COPY copilot_proxy . 0.0s
[+] Running 2/0
⠿ Container fauxpilot-copilot_proxy-1 Created 0.0s
⠿ Container fauxpilot-triton-1 Created 0.0s
Attaching to fauxpilot-copilot_proxy-1, fauxpilot-triton-1
fauxpilot-triton-1 |
fauxpilot-triton-1 | =============================
fauxpilot-triton-1 | == Triton Inference Server ==
fauxpilot-triton-1 | =============================
fauxpilot-triton-1 |
fauxpilot-triton-1 | NVIDIA Release 22.06 (build 39726160)
fauxpilot-triton-1 | Triton Server Version 2.23.0
fauxpilot-triton-1 |
fauxpilot-triton-1 | Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
fauxpilot-triton-1 |
fauxpilot-triton-1 | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
fauxpilot-triton-1 |
fauxpilot-triton-1 | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
fauxpilot-triton-1 | By pulling and using the container, you accept the terms and conditions of this license:
fauxpilot-triton-1 | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
fauxpilot-triton-1 | WARNING: Detected NVIDIA NVIDIA GeForce RTX 4090 GPU, which is not yet supported in this version of the container
fauxpilot-triton-1 | ERROR: No supported GPU(s) detected to run this container
fauxpilot-triton-1 |
fauxpilot-copilot_proxy-1 | INFO: Started server process [1]
fauxpilot-copilot_proxy-1 | INFO: Waiting for application startup.
fauxpilot-copilot_proxy-1 | INFO: Application startup complete.
fauxpilot-copilot_proxy-1 | INFO: Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
fauxpilot-triton-1 | I0321 20:35:31.923611 88 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x204e00000' with size 268435456
fauxpilot-triton-1 | I0321 20:35:31.923725 88 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
fauxpilot-triton-1 | I0321 20:35:31.925317 88 model_repository_manager.cc:1191] loading: py-model:1
fauxpilot-triton-1 | I0321 20:35:32.028166 88 python_be.cc:1774] TRITONBACKEND_ModelInstanceInitialize: py-model_0 (CPU device 0)
fauxpilot-triton-1 | Cuda available? True
fauxpilot-triton-1 | is_half: True, int8: True, auto_device_map: True
How is rtx 4090 is not supported if there example of someone using it in the matrix?
Hello there, thanks for opening your first issue. We welcome you to the FauxPilot community!
Yes – the current container is getting a bit old. There is a branch open right now to update to a newer version of Triton (and hence a newer version of CUDA) but it needs a bit more testing before we can merge.
Same on RTX 4080, fails to launch:
=> [2/3] RUN python3 -m pip install --disable-pip-version-check -U torch --extra-index-url https://download.pytorch.org/whl/cu116 188.7s
=> [3/3] RUN python3 -m pip install --disable-pip-version-check -U transformers bitsandbytes accelerate 11.9s
=> exporting to image 21.8s
=> => exporting layers 21.8s
=> => writing image sha256:62d84f9ae8d8972163cc9f4af7b36307d059352727fe283dc4e8f5aee7ebfd4e 0.0s
=> => naming to docker.io/library/fauxpilot-triton 0.0s
[+] Running 3/3
✔ Network fauxpilot_default Created 0.1s
✔ Container fauxpilot-copilot_proxy-1 Created 0.1s
✔ Container fauxpilot-triton-1 Created 0.1s
Attaching to fauxpilot-copilot_proxy-1, fauxpilot-triton-1
Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]
edit: was missing nvidia-docker.
I get the same (RTX 4080) however it does actually work:
fauxpilot-triton-1 | =============================
fauxpilot-triton-1 | == Triton Inference Server ==
fauxpilot-triton-1 | =============================
fauxpilot-triton-1 |
fauxpilot-triton-1 | NVIDIA Release 22.06 (build 39726160)
fauxpilot-triton-1 | Triton Server Version 2.23.0
fauxpilot-triton-1 |
fauxpilot-triton-1 | Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
fauxpilot-triton-1 |
fauxpilot-triton-1 | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
fauxpilot-triton-1 |
fauxpilot-triton-1 | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
fauxpilot-triton-1 | By pulling and using the container, you accept the terms and conditions of this license:
fauxpilot-triton-1 | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
fauxpilot-triton-1 | WARNING: Detected NVIDIA NVIDIA GeForce RTX 4080 GPU, which is not yet supported in this version of the container
fauxpilot-triton-1 | ERROR: No supported GPU(s) detected to run this container
fauxpilot-triton-1 |
fauxpilot-copilot_proxy-1 | INFO: Started server process [1]
fauxpilot-copilot_proxy-1 | INFO: Waiting for application startup.
fauxpilot-copilot_proxy-1 | INFO: Application startup complete.
fauxpilot-copilot_proxy-1 | INFO: Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
fauxpilot-triton-1 | I0407 09:10:42.518966 88 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f34f8000000' with size 268435456
fauxpilot-triton-1 | I0407 09:10:42.519269 88 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
fauxpilot-triton-1 | I0407 09:10:42.531040 88 model_repository_manager.cc:1191] loading: fastertransformer:1
fauxpilot-triton-1 | I0407 09:10:42.739821 88 libfastertransformer.cc:1226] TRITONBACKEND_Initialize: fastertransformer
fauxpilot-triton-1 | I0407 09:10:42.739838 88 libfastertransformer.cc:1236] Triton TRITONBACKEND API version: 1.10
fauxpilot-triton-1 | I0407 09:10:42.739842 88 libfastertransformer.cc:1242] 'fastertransformer' TRITONBACKEND API version: 1.10
fauxpilot-triton-1 | I0407 09:10:42.739875 88 libfastertransformer.cc:1274] TRITONBACKEND_ModelInitialize: fastertransformer (version 1)
// ...
fauxpilot-triton-1 | I0407 09:10:42.911052 88 libfastertransformer.cc:307] Before Loading Model:
fauxpilot-triton-1 | after allocation, free 14.62 GB total 15.70 GB
fauxpilot-triton-1 | [WARNING] gemm_config.in is not found; using default GEMM algo
fauxpilot-triton-1 | I0407 09:11:18.353777 88 libfastertransformer.cc:321] After Loading Model:
fauxpilot-triton-1 | after allocation, free 1.10 GB total 15.70 GB
fauxpilot-triton-1 | I0407 09:11:18.355366 88 libfastertransformer.cc:537] Model instance is created on GPU NVIDIA GeForce RTX 4080
fauxpilot-triton-1 | I0407 09:11:18.358094 88 model_repository_manager.cc:1345] successfully loaded 'fastertransformer' version 1
fauxpilot-triton-1 | I0407 09:11:18.358381 88 server.cc:556]
// curl -s -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"prompt":"def hello","max_tokens":100,"temperature":0.1,"stop":["\n\n"]}' http://localhost:5000/v1/engines/codegen/completions | jq
{
"id": "cmpl-h1jvz164uehkbl0tufObee0ZFUDqe",
"model": "codegen",
"object": "text_completion",
"created": 1680859410,
"choices": [
{
"text": "(self):\n return \"Hello World\"",
"index": 0,
"finish_reason": "stop",
"logprobs": null
}
],
"usage": {
"completion_tokens": 11,
"prompt_tokens": 2,
"total_tokens": 13
}
}
That said cudnn 8.8 has major improvements for the 40-series cards, would be nice to get that library updated.
Getting this with a 4070 as well.
[triton] |
[triton] | =============================
[triton] | == Triton Inference Server ==
[triton] | =============================
[triton] |
[triton] | NVIDIA Release 22.06 (build 39726160)
[triton] | Triton Server Version 2.23.0
[triton] |
[triton] | Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
[triton] |
[triton] | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
[triton] |
[triton] | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
[triton] | By pulling and using the container, you accept the terms and conditions of this license:
[triton] | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
[triton] | WARNING: Detected NVIDIA NVIDIA GeForce RTX 4070 Laptop GPU GPU, which is not yet supported in this version of the container
[triton] | ERROR: No supported GPU(s) detected to run this container
[triton] |
Where is moyix/triton_with_ft:22.09 built from? I see it on Docker Hub but I can't find the Dockerfile anywhere to try updating stuff.