fastertransformer_backend
fastertransformer_backend copied to clipboard
CUDA: Operation Not Supported
Description
Hi, I'm trying to run triton:22.03 / FasterTransformer within a kubernetes pod.
Running
CUDA_VISIBLE_DEVICES=0 mpirun -n 1 --allow-run-as-root /opt/tritonserver/bin/tritonserver --model-repository=${WORKSPACE}/all_models/gptj/
gives me this error:
what(): [FT][ERROR] CUDA runtime error: operation not supported /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/allocator.h:160
The operation in question is
check_cuda_error(cudaDeviceGetDefaultMemPool(&mempool, device_id));
**Click here for full error log**
root@triton-deployment:/workspace/build/fastertransformer_backend/all_models/gptj/fastertransformer# CUDA_VISIBLE_DEVICES=0 mpirun -n 1 --allow-run-as-root /opt/tritonserver/bin/tritonserver --model-repository=${WORKSPACE}/all_models/gptj/
I0504 19:10:00.078200 3296 libtorch.cc:1309] TRITONBACKEND_Initialize: pytorch
I0504 19:10:00.078309 3296 libtorch.cc:1319] Triton TRITONBACKEND API version: 1.8
I0504 19:10:00.078314 3296 libtorch.cc:1325] 'pytorch' TRITONBACKEND API version: 1.8
2023-05-04 19:10:00.248359: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-05-04 19:10:00.281753: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0504 19:10:00.281830 3296 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0504 19:10:00.281850 3296 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0504 19:10:00.281854 3296 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0504 19:10:00.281858 3296 tensorflow.cc:2216] backend configuration:
{}
I0504 19:10:00.283495 3296 onnxruntime.cc:2319] TRITONBACKEND_Initialize: onnxruntime
I0504 19:10:00.283521 3296 onnxruntime.cc:2329] Triton TRITONBACKEND API version: 1.8
I0504 19:10:00.283526 3296 onnxruntime.cc:2335] 'onnxruntime' TRITONBACKEND API version: 1.8
I0504 19:10:00.283529 3296 onnxruntime.cc:2365] backend configuration:
{}
I0504 19:10:00.299472 3296 openvino.cc:1207] TRITONBACKEND_Initialize: openvino
I0504 19:10:00.299491 3296 openvino.cc:1217] Triton TRITONBACKEND API version: 1.8
I0504 19:10:00.299496 3296 openvino.cc:1223] 'openvino' TRITONBACKEND API version: 1.8
I0504 19:10:00.588906 3296 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x10018000000' with size 268435456
I0504 19:10:00.589474 3296 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
E0504 19:10:00.591299 3296 model_repository_manager.cc:1927] Poll failed for model directory 'ensemble': ensemble input 'runtime_top_k' is optional, optional ensemble input is not currently supported
I0504 19:10:00.595117 3296 model_repository_manager.cc:997] loading: preprocessing:1
I0504 19:10:00.695602 3296 model_repository_manager.cc:997] loading: postprocessing:1
I0504 19:10:00.703348 3296 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: preprocessing_0 (CPU device 0)
I0504 19:10:00.796223 3296 model_repository_manager.cc:997] loading: fastertransformer:1
I0504 19:10:02.929660 3296 model_repository_manager.cc:1152] successfully loaded 'preprocessing' version 1
I0504 19:10:03.063112 3296 libfastertransformer.cc:1828] TRITONBACKEND_Initialize: fastertransformer
I0504 19:10:03.063213 3296 libfastertransformer.cc:1838] Triton TRITONBACKEND API version: 1.8
I0504 19:10:03.063328 3296 libfastertransformer.cc:1844] 'fastertransformer' TRITONBACKEND API version: 1.8
I0504 19:10:03.063415 3296 libfastertransformer.cc:1876] TRITONBACKEND_ModelInitialize: fastertransformer (version 1)
I0504 19:10:03.064111 3296 libfastertransformer.cc:372] Instance group type: KIND_CPU count: 1
I0504 19:10:03.064170 3296 libfastertransformer.cc:402] Sequence Batching: disabled
I0504 19:10:03.064207 3296 libfastertransformer.cc:412] Dynamic Batching: disabled
I0504 19:10:03.064357 3296 libfastertransformer.cc:438] Before Loading Weights:
after allocation : free: 28.85 GB, total: 32.00 GB, used: 3.15 GB
I0504 19:10:16.380581 3296 libfastertransformer.cc:448] After Loading Weights:
after allocation : free: 17.58 GB, total: 32.00 GB, used: 14.42 GB
I0504 19:10:16.381466 3296 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: postprocessing_0 (CPU device 0)
I0504 19:10:17.839193 3296 libfastertransformer.cc:472] Before Loading Model:
I0504 19:10:17.839390 3296 model_repository_manager.cc:1152] successfully loaded 'postprocessing' version 1
after allocation : free: 17.58 GB, total: 32.00 GB, used: 14.42 GB
terminate called after throwing an instance of 'std::runtime_error'
what(): [FT][ERROR] CUDA runtime error: operation not supported /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/allocator.h:160
[triton-deployment:03296] *** Process received signal ***
[triton-deployment:03296] Signal: Aborted (6)
[triton-deployment:03296] Signal code: (-6)
[triton-deployment:03296] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f4715572420]
[triton-deployment:03296] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f4714cf800b]
[triton-deployment:03296] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f4714cd7859]
[triton-deployment:03296] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7f47150b1911]
[triton-deployment:03296] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7f47150bd38c]
[triton-deployment:03296] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7f47150bd3f7]
[triton-deployment:03296] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7f47150bd6a9]
[triton-deployment:03296] [ 7] /opt/tritonserver/backends/fastertransformer/libtransformer-shared.so(_ZN17fastertransformer5checkI9cudaErrorEEvT_PKcS4_i+0x219)[0x7f459d04ebe9]
[triton-deployment:03296] [ 8] /opt/tritonserver/backends/fastertransformer/libtransformer-shared.so(_ZN17fastertransformer9AllocatorILNS_13AllocatorTypeE0EEC1Ei+0x123)[0x7f459d08b813]
[triton-deployment:03296] [ 9] /opt/tritonserver/backends/fastertransformer/libtransformer-shared.so(_ZN15GptJTritonModelI6__halfE19createModelInstanceEiiP11CUstream_stSt4pairISt6vectorIN17fastertransformer9NcclParamESaIS7_EES9_ESt10shared_ptrINS6_18AbstractCustomCommEE+0xad)[0x7f459d14a68d]
[triton-deployment:03296] [10] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x19a38)[0x7f46402d6a38]
[triton-deployment:03296] [11] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x1a323)[0x7f46402d7323]
[triton-deployment:03296] [12] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x3c11e)[0x7f46402f911e]
[triton-deployment:03296] [13] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f47150e9de4]
[triton-deployment:03296] [14] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f4715566609]
[triton-deployment:03296] [15] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f4714dd4133]
[triton-deployment:03296] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
0504 19:10:18.364089 3359 pb_stub.cc:821] Non-graceful termination detected.
0504 19:10:18.512814 3300 pb_stub.cc:821] Non-graceful termination detected.
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node triton-deployment exited on signal 6 (Aborted).
I've gotten this same error with both GPT-J and T5. Its likely a CUDA problem but as far as I know, I have the correct versions..
Here is my NVIDIA-SMI:
root@triton-deployment:/workspace/build/fastertransformer_backend/all_models/gptj/fastertransformer# nvidia-smi
Thu May 4 19:15:20 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.02 Driver Version: 510.85.02 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GRID V100D-32C On | 00000000:02:00.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 0MiB / 32768MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
and "nvcc -version":
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Thu_Feb_10_18:23:41_PST_2022
Cuda compilation tools, release 11.6, V11.6.112
Build cuda_11.6.r11.6/compiler.30978841_0
Help would be appreciated, thanks!
### Reproduced Steps
CUDA_VISIBLE_DEVICES=0 mpirun -n 1 --allow-run-as-root /opt/tritonserver/bin/tritonserver --model-repository=${WORKSPACE}/all_models/gptj/
Related:
https://github.com/NVIDIA/FasterTransformer/issues/592