fastertransformer_backend icon indicating copy to clipboard operation
fastertransformer_backend copied to clipboard

FT backend crashes Triton server if batch size is too large

Open moyix opened this issue 2 years ago • 0 comments

Description

Branch: main
Docker version: 22.03
GPU type: 2x NVIDIA RTX A6000

Reproduced Steps

  1. Load a model with the fastertransformer backend.
  2. Make a query with a batch size that is too large for GPU memory.

The server crashes with:

terminate called after throwing an instance of 'std::runtime_error'
  what():  [FT][ERROR] CUDA runtime error: out of memory /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/memory_utils.cu:26 

[gv013:3677168] *** Process received signal ***
[gv013:3677168] Signal: Aborted (6)
[gv013:3677168] Signal code:  (-6)
[gv013:3677168] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x1472e53a8420]
[gv013:3677168] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x1472e3d9c00b]
[gv013:3677168] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x1472e3d7b859]
[gv013:3677168] [ 3] /lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x1472e4155911]
[gv013:3677168] [ 4] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x1472e416138c]
[gv013:3677168] [ 5] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x1472e41613f7]
[gv013:3677168] [ 6] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x1472e41616a9]
[gv013:3677168] [ 7] /opt/tritonserver/backends/fastertransformer/libtransformer-shared.so(_ZN17fastertransformer5checkI9cudaErrorEEvT_PKcS4_i+0x219)[0x147265e83ab9]
[gv013:3677168] [ 8] /opt/tritonserver/backends/fastertransformer/libtransformer-shared.so(_ZN17fastertransformer12deviceMallocI6__halfEEvPPT_ib+0x36)[0x147265ff6146]
[gv013:3677168] [ 9] /opt/tritonserver/backends/fastertransformer/libtransformer-shared.so(_ZN17fastertransformer10GptJWeightI6__halfE13mallocWeightsEv+0x60)[0x147265eccc40]
[gv013:3677168] [10] /opt/tritonserver/backends/fastertransformer/libtransformer-shared.so(_ZN17fastertransformer10GptJWeightI6__halfEC2Eiiiiiiiii+0x148)[0x147265ed05e8]
[gv013:3677168] [11] /opt/tritonserver/backends/fastertransformer/libtransformer-shared.so(_ZN15GptJTritonModelI6__halfE19createModelInstanceEiiP11CUstream_stSt4pairISt6vectorIP8ncclCommSaIS7_EES9_ESt10shared_ptrIN17fastertransformer18AbstractCustomCommEE+0x3f7)[0x147265ec23e7]
[gv013:3677168] [12] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x16eb3)[0x1472daa44eb3]
[gv013:3677168] [13] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x173c2)[0x1472daa453c2]
[gv013:3677168] [14] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x2b23d)[0x1472daa5923d]
[gv013:3677168] [15] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x1472e418dde4]
[gv013:3677168] [16] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x1472e539c609]
[gv013:3677168] [17] /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x1472e3e78133]
[gv013:3677168] *** End of error message ***

It would be better if the FT backend just detected the out of memory condition and returned an error code for the request, rather than raising an assertion that crashes the whole server.

moyix avatar Aug 05 '22 13:08 moyix