lightseq
lightseq copied to clipboard
Multi GPU triton failures
Hi! I am working on serving a model via triton on a multi GPU node (AWS ml.g4dn.12xlarge). It seems that I can only get it to successfully run multiple requests if I limit it to a single GPU. If I allow more than one GPU, the first request will succeed but the second will cause an error shown below.
`terminate called after throwing an instance of 'std::runtime_error' what(): [CUDA][ERROR] /opt/lightseq/lightseq/inference/pywrapper/transformer.cc(119): cudaErrorIllegalAddressan illegal memory access was encountered
Signal (6) received. 0# 0x000055CED2AA3EB9 in tritonserver 1# 0x00007F1BC4CAC210 in /usr/lib/x86_64-linux-gnu/libc.so.6 2# gsignal in /usr/lib/x86_64-linux-gnu/libc.so.6 3# abort in /usr/lib/x86_64-linux-gnu/libc.so.6 4# 0x00007F1BC5062911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 5# 0x00007F1BC506E38C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 6# 0x00007F1BC506E3F7 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 7# 0x00007F1BC506E6A9 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 8# void lightseq::cuda::check_gpu_error<cudaError>(cudaError, char const*, char const*, int) in /opt/tritonserver/lib/libliblightseq.so 9# lightseq::cuda::Transformer::Infer() in /opt/tritonserver/lib/libliblightseq.so 10# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/lightseq/libtriton_lightseq.so 11# 0x00007F1BC583410A in /opt/tritonserver/lib/libtritonserver.so 12# 0x00007F1BC58349B7 in /opt/tritonserver/lib/libtritonserver.so 13# 0x00007F1BC56E03C1 in /opt/tritonserver/lib/libtritonserver.so 14# 0x00007F1BC582DF87 in /opt/tritonserver/lib/libtritonserver.so 15# 0x00007F1BC509ADE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 16# 0x00007F1BC5518609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0 17# clone in /usr/lib/x86_64-linux-gnu/libc.so.6
Signal (11) received. 0# 0x000055CED2AA3EB9 in tritonserver 1# 0x00007F1BC4CAC210 in /usr/lib/x86_64-linux-gnu/libc.so.6 2# abort in /usr/lib/x86_64-linux-gnu/libc.so.6 3# 0x00007F1BC5062911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 4# 0x00007F1BC506E38C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 5# 0x00007F1BC506E3F7 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 6# 0x00007F1BC506E6A9 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 7# void lightseq::cuda::check_gpu_error<cudaError>(cudaError, char const*, char const*, int) in /opt/tritonserver/lib/libliblightseq.so 8# lightseq::cuda::Transformer::Infer() in /opt/tritonserver/lib/libliblightseq.so 9# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/lightseq/libtriton_lightseq.so 10# 0x00007F1BC583410A in /opt/tritonserver/lib/libtritonserver.so 11# 0x00007F1BC58349B7 in /opt/tritonserver/lib/libtritonserver.so 12# 0x00007F1BC56E03C1 in /opt/tritonserver/lib/libtritonserver.so 13# 0x00007F1BC582DF87 in /opt/tritonserver/lib/libtritonserver.so 14# 0x00007F1BC509ADE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 15# 0x00007F1BC5518609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0 16# clone in /usr/lib/x86_64-linux-gnu/libc.so.6`
I am using the model generated from the fairseq export example and the docker image hexisyztem/tritonserver_lightseq:22.01-1.
What is your configuration file, I guess you may have assigned all the models to GPU-0, but this requires me to analyze it in combination with your configuration
By the way, instance_group - count needs to be seted to 1. https://github.com/bytedance/lightseq/blob/master/examples/triton_backend/model_repo/transformer_example/config.pbtxt#L25
Here is my configuration file, it matches the example except that it uses the en-de model from the export example and not the bart model. The model was also modified to only output a single result (beam 1), hence why the dimensions on the output are only a single dimension. To limit to only a single GPU, no change was made to the config file but docker was launched with gpus=1.
name: "generator"
backend: "lightseq"
max_batch_size: 8
input [
{
name: "source_ids"
data_type: TYPE_INT32
dims: [ -1 ]
}
]
output [
{
name: "target_ids"
data_type: TYPE_INT32
dims: [-1 ]
},
{
name: "target_scores"
data_type: TYPE_FP32
dims: [-1 ]
}
]
instance_group [
{
count: 1
kind: KIND_GPU
}
]
default_model_filename: "chkpt_native.hdf5"
parameters: [
{
key: "model_type"
value: {
string_value: "Transformer"
}
}
]
Attached are logs for using 1 or 2 gpu's, after building the docker from source and enabling debug mode. logs_single_gpu.txt logs_double_gpu.txt