lightseq Multi GPU triton failures

Hi! I am working on serving a model via triton on a multi GPU node (AWS ml.g4dn.12xlarge). It seems that I can only get it to successfully run multiple requests if I limit it to a single GPU. If I allow more than one GPU, the first request will succeed but the second will cause an error shown below.

`terminate called after throwing an instance of 'std::runtime_error' what(): [CUDA][ERROR] /opt/lightseq/lightseq/inference/pywrapper/transformer.cc(119): cudaErrorIllegalAddressan illegal memory access was encountered

Signal (6) received. 0# 0x000055CED2AA3EB9 in tritonserver 1# 0x00007F1BC4CAC210 in /usr/lib/x86_64-linux-gnu/libc.so.6 2# gsignal in /usr/lib/x86_64-linux-gnu/libc.so.6 3# abort in /usr/lib/x86_64-linux-gnu/libc.so.6 4# 0x00007F1BC5062911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 5# 0x00007F1BC506E38C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 6# 0x00007F1BC506E3F7 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 7# 0x00007F1BC506E6A9 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 8# void lightseq::cuda::check_gpu_error<cudaError>(cudaError, char const*, char const*, int) in /opt/tritonserver/lib/libliblightseq.so 9# lightseq::cuda::Transformer::Infer() in /opt/tritonserver/lib/libliblightseq.so 10# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/lightseq/libtriton_lightseq.so 11# 0x00007F1BC583410A in /opt/tritonserver/lib/libtritonserver.so 12# 0x00007F1BC58349B7 in /opt/tritonserver/lib/libtritonserver.so 13# 0x00007F1BC56E03C1 in /opt/tritonserver/lib/libtritonserver.so 14# 0x00007F1BC582DF87 in /opt/tritonserver/lib/libtritonserver.so 15# 0x00007F1BC509ADE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 16# 0x00007F1BC5518609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0 17# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

Signal (11) received. 0# 0x000055CED2AA3EB9 in tritonserver 1# 0x00007F1BC4CAC210 in /usr/lib/x86_64-linux-gnu/libc.so.6 2# abort in /usr/lib/x86_64-linux-gnu/libc.so.6 3# 0x00007F1BC5062911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 4# 0x00007F1BC506E38C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 5# 0x00007F1BC506E3F7 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 6# 0x00007F1BC506E6A9 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 7# void lightseq::cuda::check_gpu_error<cudaError>(cudaError, char const*, char const*, int) in /opt/tritonserver/lib/libliblightseq.so 8# lightseq::cuda::Transformer::Infer() in /opt/tritonserver/lib/libliblightseq.so 9# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/lightseq/libtriton_lightseq.so 10# 0x00007F1BC583410A in /opt/tritonserver/lib/libtritonserver.so 11# 0x00007F1BC58349B7 in /opt/tritonserver/lib/libtritonserver.so 12# 0x00007F1BC56E03C1 in /opt/tritonserver/lib/libtritonserver.so 13# 0x00007F1BC582DF87 in /opt/tritonserver/lib/libtritonserver.so 14# 0x00007F1BC509ADE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 15# 0x00007F1BC5518609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0 16# clone in /usr/lib/x86_64-linux-gnu/libc.so.6`

I am using the model generated from the fairseq export example and the docker image hexisyztem/tritonserver_lightseq:22.01-1.

Nov 22 '22 23:11 Csinclair0

What is your configuration file, I guess you may have assigned all the models to GPU-0, but this requires me to analyze it in combination with your configuration

Nov 24 '22 03:11 hexisyztem

By the way, instance_group - count needs to be seted to 1. https://github.com/bytedance/lightseq/blob/master/examples/triton_backend/model_repo/transformer_example/config.pbtxt#L25

Nov 24 '22 15:11 hexisyztem

Here is my configuration file, it matches the example except that it uses the en-de model from the export example and not the bart model. The model was also modified to only output a single result (beam 1), hence why the dimensions on the output are only a single dimension. To limit to only a single GPU, no change was made to the config file but docker was launched with gpus=1.

name: "generator"
backend: "lightseq"
max_batch_size: 8
input [
  {
    name: "source_ids"
    data_type: TYPE_INT32
    dims: [ -1 ]
  }
]
output [
  {
    name: "target_ids"
    data_type: TYPE_INT32
    dims: [-1 ]
  },
  {
    name: "target_scores"
    data_type: TYPE_FP32
    dims: [-1 ]
  }
]
instance_group [
  {
    count: 1
    kind: KIND_GPU
  }
]
default_model_filename: "chkpt_native.hdf5"
parameters: [
    {
        key: "model_type"
        value: {
            string_value: "Transformer"
        }
    }
]

Nov 28 '22 18:11 Csinclair0

Attached are logs for using 1 or 2 gpu's, after building the docker from source and enabling debug mode. logs_single_gpu.txt logs_double_gpu.txt

Nov 28 '22 23:11 Csinclair0

lightseq lightseq copied to clipboard

Multi GPU triton failures

lightseq
lightseq copied to clipboard