tabby icon indicating copy to clipboard operation
tabby copied to clipboard

Error when I run the Docker Hub container or build from scratch

Open cowmix opened this issue 1 year ago • 8 comments

2023-04-09 00:05:43,013 DEBG 'triton' stderr output:
terminate called after throwing an instance of 'std::runtime_error'
  what():  [FT][ERROR] CUDA runtime error: the provided PTX was compiled with an unsupported toolchain. /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/cuda_utils.h:274

[e79652de0dc1:01747] *** Process received signal ***
[e79652de0dc1:01747] Signal: Aborted (6)
[e79652de0dc1:01747] Signal code:  (-6)
[e79652de0dc1:01747] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7fbedac13420]
[e79652de0dc1:01747] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7fbed949e00b]
[e79652de0dc1:01747] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7fbed947d859]
[e79652de0dc1:01747] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7fbed9857911]
[e79652de0dc1:01747] [ 4]
2023-04-09 00:05:43,014 DEBG 'triton' stderr output:
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7fbed986338c]
[e79652de0dc1:01747] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7fbed98633f7]
[e79652de0dc1:01747] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7fbed98636a9]
[e79652de0dc1:01747] [ 7] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x3b949)[0x7fbebc686949]
[e79652de0dc1:01747] [ 8] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x20f65)[0x7fbebc66bf65]
[e79652de0dc1:01747] [ 9] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x2d794)[0x7fbebc678794]
[e79652de0dc1:01747] [10] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(TRITONBACKEND_ModelInitialize+0x38d)[0x7fbebc678e0d]
[e79652de0dc1:01747] [11] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10689b)[0x7fbed9d4889b]
[e79652de0dc1:01747] [12] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1c4f5d)[0x7fbed9e06f5d]
[e79652de0dc1:01747] [13] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1caccd)[0x7fbed9e0cccd]
[e79652de0dc1:01747] [14] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x3083a0)[0x7fbed9f4a3a0]
[e79652de0dc1:01747] [15] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7fbed988fde4]
[e79652de0dc1:01747] [16] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7fbedac07609]
[e79652de0dc1:01747] [17] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7fbed957a133]
[e79652de0dc1:01747] *** End of error message ***

cowmix avatar Apr 09 '23 00:04 cowmix

Could you provide the output of nvidia-smi?

wsxiaoys avatar Apr 09 '23 00:04 wsxiaoys

What's the output from this?

sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

The verification command from the nvidia container runtime setup.

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#setting-up-nvidia-container-toolkit

ghthor avatar Apr 09 '23 17:04 ghthor

Sun Apr  9 18:09:43 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 37%   34C    P8    16W / 170W |     67MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

cowmix avatar Apr 09 '23 18:04 cowmix

What is your GPU? The nvidia-smi output cuts off after "GeForce"...

wsxiaoys avatar Apr 09 '23 23:04 wsxiaoys

name, pci.bus_id, vbios_version
NVIDIA GeForce RTX 3060, 00000000:01:00.0, 94.06.25.00.FD

cowmix avatar Apr 10 '23 03:04 cowmix

I haven't found any clues, but a Google search shows that it might be related to an unmatched CUDA version.

wsxiaoys avatar Apr 10 '23 08:04 wsxiaoys

@wsxiaoys - weird.. Everything should be contained within the Docker container, right?

cowmix avatar Apr 10 '23 14:04 cowmix

I was wondering that too, but to be safe I made sure the host OS had the latest nvidia driver as well. Haven’t tested without installling that driver on the host

On Mon, Apr 10, 2023 at 10:51 M March @.***> wrote:

@wsxiaoys https://github.com/wsxiaoys - weird.. Everything should be contained within the Docker container, right?

— Reply to this email directly, view it on GitHub https://github.com/TabbyML/tabby/issues/68#issuecomment-1501910823, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABHEKWJS6554QEJXZRNDA3XAQM63ANCNFSM6AAAAAAWXYBQLY . You are receiving this because you commented.Message ID: @.***>

ghthor avatar Apr 10 '23 15:04 ghthor