server Server crashes on loading shared libraries

Description I am in the development phase of running Deep Learning Model on Triton Inference Server. I am using the LD_PRELOAD trick to load customs ops needed to support inference. But as soon as I do that, the server crashes with the following error:


Signal (11) received.
 0# 0x000055D4EB0F28A9 in tritonserver
 1# 0x00007F62E3251210 in /lib/x86_64-linux-gnu/libc.so.6
 2# re2::RE2::Match(re2::StringPiece const&, unsigned long, unsigned long, re2::RE2::Anchor, re2::StringPiece*, int) const in /lib/x86_64-linux-gnu/libre2.so.5
 3# re2::RE2::DoMatch(re2::StringPiece const&, re2::RE2::Anchor, unsigned long*, re2::RE2::Arg const* const*, int) const in /lib/x86_64-linux-gnu/libre2.so.5
 4# 0x000055D4EB0F7196 in tritonserver
 5# 0x000055D4EB2BD226 in tritonserver
 6# 0x000055D4EB2C20A8 in tritonserver
 7# 0x000055D4EB2BFE8E in tritonserver
 8# 0x000055D4EB2A6E20 in tritonserver
 9# 0x000055D4EB2AE940 in tritonserver
10# 0x000055D4EB2AF38F in tritonserver
11# 0x000055D4EB2C3CE2 in tritonserver
12# 0x00007F62E3ABD609 in /lib/x86_64-linux-gnu/libpthread.so.0
13# clone in /lib/x86_64-linux-gnu/libc.so.6


Segmentation fault

Triton Information What version of Triton are you using?

Environment TensorRT Version: GPU Type: GPU 0: Tesla V100-SXM2-16GB

TRITON_SERVER_VERSION=2.15.0 NVIDIA_TRITON_SERVER_VERSION=21.10 NSIGHT_SYSTEMS_VERSION=2021.3.2.4 Triton Image: “21.10” CUDA Version: CUDA_VERSION=11.4.3.001 CUDA_DRIVER_VERSION=470.57.02 CUDNN Version: Operating System + Version: Distributor ID: Linux 0e2e4678631d 5.10.47-linuxkit #1 SMP Sat Jul 3 21:51:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux Python Version (if applicable): python3.8

Are you using the Triton container or did you build it yourself? Triton Container

To Reproduce Steps to reproduce the behavior

docker run -it \
 --rm -p8000:8000 -p8001:8001 -p8002:8002 \
 -v /Users/priyankasaraf/repo/triton/selftest/model_dir/OutNonPyFuncFP32:/models \
 -v /Users/priyankasaraf/repo/triton/selftest/binaries:/triton/lib \
 --entrypoint /bin/bash \
 nvcr.io/nvidia/tritonserver:21.10-py3

export LD_LIBRARY_PATH=/opt/tritonserver/backends/tensorflow2:$LD_LIBRARY_PATH && export LD_PRELOAD="/triton/lib/_sentencepiece_tokenizer.so /triton/lib/_normalize_ops.so /triton/lib/_regex_split_ops.so /triton/lib/_wordpiece_tokenizer.so" && tritonserver --backend-config=tensorflow,version=2 --model-repository=/models \
--model-control-mode=explicit \
--log-verbose=5 --log-info=true --log-warning=true --log-error=true \
--http-port=8000 --grpc-port=8001 \
--metrics-port=8002

.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior The server must not crash

Jul 19 '22 00:07 PRIYANKArythem3

Thanks for getting this into our attention. I have filed a ticket for us to investigate further.

Jul 19 '22 17:07 kthui

Hi @PRIYANKArythem3 , have you tried to include Triton lib first in the LD_PRELOAD to avoid any potential symbol issues as discussed here: https://github.com/triton-inference-server/server/issues/4456#issuecomment-1164811121? Perhaps this issue is similarly related.

LD_PRELOAD="/opt/tritonserver/lib/libtritonserver.so <custom_ops>"

Jul 19 '22 21:07 rmccorm4

I have done some more investigation, and found out that the server does not crash, but

Scenario 1: Use _regex_split_ops.so and --metrics-port=8002. I am using a shared library needed for one of our client's models which along with the use of metrics port at 8002 causes the http endpoint at port 8000 to return 400s, with Segmentation Fault. Server start script:

priyankasaraf@priyank-ltmatct triton % docker run -it \
 --rm -p8000:8000 -p8001:8001 -p8002:8002 \
 -v /Users/priyankasaraf/repo/triton/selftest/model_dir/OutNonPyFuncFP32:/models \
 -v /Users/priyankasaraf/repo/triton/selftest/binaries:/triton/lib \
 --entrypoint /bin/bash \
 nvcr.io/nvidia/tritonserver:21.10-py3
root@38f60d0f0e2d:/opt/tritonserver# export LD_LIBRARY_PATH=/opt/tritonserver/backends/tensorflow2:$LD_LIBRARY_PATH && export LD_PRELOAD="/triton/lib/_sentencepiece_tokenizer.so /triton/lib/_normalize_ops.so /triton/lib/_regex_split_ops.so /triton/lib/_wordpiece_tokenizer.so" && tritonserver --backend-config=tensorflow,version=2 --model-repository=/models \
--model-control-mode=explicit --log-verbose=5 --log-info=true --log-warning=true --log-error=true --http-port=8000 --grpc-port=8001 --metrics-port=8002

Segmentation Logs

Signal (11) received.
 0# 0x0000560479D798A9 in tritonserver
 1# 0x00007EFFCD850210 in /lib/x86_64-linux-gnu/libc.so.6
 2# re2::RE2::Match(re2::StringPiece const&, unsigned long, unsigned long, re2::RE2::Anchor, re2::StringPiece*, int) const in /lib/x86_64-linux-gnu/libre2.so.5
 3# re2::RE2::DoMatch(re2::StringPiece const&, re2::RE2::Anchor, unsigned long*, re2::RE2::Arg const* const*, int) const in /lib/x86_64-linux-gnu/libre2.so.5
 4# 0x0000560479D7E196 in tritonserver
 5# 0x0000560479F44226 in tritonserver
 6# 0x0000560479F490A8 in tritonserver
 7# 0x0000560479F46E8E in tritonserver
 8# 0x0000560479F2DE20 in tritonserver
 9# 0x0000560479F35940 in tritonserver
10# 0x0000560479F3638F in tritonserver
11# 0x0000560479F4ACE2 in tritonserver
12# 0x00007EFFCE0BC609 in /lib/x86_64-linux-gnu/libpthread.so.0
13# clone in /lib/x86_64-linux-gnu/libc.so.6

Segmentation fault

Scenario 2: If I change the --metrics-port=8003, or not add "_regex_split_ops.so"the server runs as expected, with http endpoint returning 200(OK)

Server start script:

priyankasaraf@priyank-ltmatct triton % docker run -it \
 --rm -p8000:8000 -p8001:8001 -p8002:8002 \
 -v /Users/priyankasaraf/repo/triton/selftest/model_dir/OutNonPyFuncFP32:/models \
 -v /Users/priyankasaraf/repo/triton/selftest/binaries:/triton/lib \
 --entrypoint /bin/bash \
 nvcr.io/nvidia/tritonserver:21.10-py3
root@38f60d0f0e2d:/opt/tritonserver# export LD_LIBRARY_PATH=/opt/tritonserver/backends/tensorflow2:$LD_LIBRARY_PATH && export LD_PRELOAD="/triton/lib/_sentencepiece_tokenizer.so /triton/lib/_normalize_ops.so /triton/lib/_wordpiece_tokenizer.so" && tritonserver --backend-config=tensorflow,version=2 --model-repository=/models \
--model-control-mode=explicit --log-verbose=5 --log-info=true --log-warning=true --log-error=true --http-port=8000 --grpc-port=8001 --metrics-port=8003

Jul 19 '22 21:07 PRIYANKArythem3

Hi @PRIYANKArythem3 , have you tried to include Triton lib first in the LD_PRELOAD to avoid any potential symbol issues as discussed here: #4456 (comment)? Perhaps this issue is similarly related.
LD_PRELOAD="/opt/tritonserver/lib/libtritonserver.so <custom_ops>"

Tried this, but still ends up in segmentation faults.

request/curl metrics endpoint on 8002 port --> segmentation faults and HTTP endpoint returns 400

Note: Until a request to metrics endpoint 8002/metrics is not made, the segmentation fault does not happen.

Jul 19 '22 22:07 PRIYANKArythem3

@PRIYANKArythem3 This version of Triton you are using is quite old, are you able to update to the latest container. 22.06?

Jul 20 '22 22:07 cnegron-nv

@PRIYANKArythem3 This version of Triton you are using is quite old, are you able to update to the latest container. 22.06?

Not able to run image 22.06 either. Mounting the same binaries as referenced above, it throws the error. This library had no issues in triton 21.10 tritonserver: symbol lookup error: /triton/lib/_wordpiece_tokenizer.so: undefined symbol: _ZNK10tensorflow8OpKernel11TraceStringERKNS_15OpKernelContextEb

Docker command used:

priyankasaraf@priyank-ltmatct triton % docker run -it \
 --rm -p8000:8000 -p8001:8001 -p8002:8002 \
 -v /Users/priyankasaraf/repo/triton/selftest/model_dir/OutNonPyFuncFP32:/models \
 -v /Users/priyankasaraf/repo/triton/selftest/binaries:/triton/lib \
 --entrypoint /bin/bash \
 nvcr.io/nvidia/tritonserver:22.06-py3
root@7bac43d9115b:/opt/tritonserver# export LD_LIBRARY_PATH=/opt/tritonserver/backends/tensorflow2:$LD_LIBRARY_PATH 
root@7bac43d9115b:/opt/tritonserver# export LD_PRELOAD="/triton/lib/_sentencepiece_tokenizer.so /triton/lib/_normalize_ops.so /triton/lib/_regex_split_ops.so /triton/lib/_wordpiece_tokenizer.so"
root@7bac43d9115b:/opt/tritonserver# tritonserver --backend-config=tensorflow,version=2 --model-repository=/models \
> --model-control-mode=explicit \
> --log-verbose=5 --log-info=true --log-warning=true --log-error=true \
> --http-port=8000 --grpc-port=8001 \
> --metrics-port=8002
tritonserver: symbol lookup error: /triton/lib/_wordpiece_tokenizer.so: undefined symbol: _ZNK10tensorflow8OpKernel11TraceStringERKNS_15OpKernelContextEb
root@7bac43d9115b:/opt/tritonserver# tritonserver --backend-config=tensorflow,version=2 --model-repository=/models --model-control-mode=explicit

Jul 22 '22 16:07 PRIYANKArythem3

@cnegron-nv Any more suggestions/ updates here?

Jul 29 '22 15:07 PRIYANKArythem3

For the new issue that you have seen, I think it is because of the TF version update and you may need to rebuild your custom ops with the corresponding TF container.

To your original issue, since Triton also has RE2 as its dependency, can you check what RE2 version that your custom ops are compiled against to check if the symbols of wrong version are used at runtime? For Triton we install libre2-5

Aug 17 '22 00:08 GuanLuo

I'm getting the same issue as this on 22.08. I am not loading any custom ops. However, I do have tensorflow-cpu 2.7.0 pip installed on the same server. Could this be the issue?

Sep 04 '22 22:09 lminer

@lminer Are you able to provide a backtrack? And do you mean that tensorflow-cpu 2.7.0 is installed while the TF libraries shipped with Triton docker image is also available within the system? I don't have in depth experience with TF and not sure if that is possible to have different TF versions available and expect everything to work.

Sep 07 '22 01:09 GuanLuo

That is indeed what I mean. How do I go about producing a backtrack? Interestingly, 21.04 works fine. What version of tensorflow is the 22.08 backend built from?

Sep 07 '22 01:09 lminer

22.08 ships TF 2.9.1 / 1.15.5. For backtrace, you can build Triton with debug and run it within a debugger. And what is your command to start the server? One thing to be pointed out is that since 22.08, Triton will use TF2 as default framework for TF models (previous default is TF1)

Sep 12 '22 17:09 GuanLuo

Closing due to inactivity. Please let us know to reopen the issue if you'd like to follow up.

Sep 30 '22 22:09 the-david-oy

I'm still getting this error and I no longer have tensorflow installed.

Signal (11) received.
 0# 0x000055BDC5921459 in tritonserver
 1# 0x00007FB98F5B4090 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# 0x00007FB9843339D8 in /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so
 3# 0x00007FB9842E8A74 in /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so
 4# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so
 5# 0x00007FB98FE6B19A in /opt/tritonserver/bin/../lib/libtritonserver.so
 6# 0x00007FB98FE6BB07 in /opt/tritonserver/bin/../lib/libtritonserver.so
 7# 0x00007FB98FF3F121 in /opt/tritonserver/bin/../lib/libtritonserver.so
 8# 0x00007FB98FE65E47 in /opt/tritonserver/bin/../lib/libtritonserver.so
 9# 0x00007FB98F9A5DE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
10# 0x00007FB990D1D609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
11# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

Here's the dockerfile I'm using

# Base image on the minimum Triton container
FROM nvcr.io/nvidia/tritonserver:22.12-py3

ARG audioshake_environment=dev
# Specify accept-bind-to-port LABEL for inference pipelines to use SAGEMAKER_BIND_TO_PORT
# https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipeline-real-time.html
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true


# See http://bugs.python.org/issue19846
ENV LANG=C.UTF-8
ENV PYTHONDONTWRITEBYTECODE=1
# Python won’t try to write .pyc or .pyo files on the import of source modules
ENV PYTHONUNBUFFERED=1
ENV PATH="$PATH:/sagemaker"
ENV MODEL_BASE_PATH=/models
# The only required piece is the model name in order to differentiate endpoints
ENV MODEL_NAME=model
# Fix for the interactive mode during an install in step 21
ENV DEBIAN_FRONTEND=noninteractive

# allow unauthenticated and allow downgrades for special libcublas library
RUN apt-get update \
 && apt-get install -y --no-install-recommends --allow-unauthenticated --allow-downgrades --no-upgrade \
   libbz2-dev \
   liblzma-dev \
   libgomp1 \
    unzip \
   zlib1g-dev \
    openssl \
    libssl1.1 \
   libreadline-gplv2-dev \
   libncursesw5-dev \
   libsqlite3-dev \
   tk-dev \
   libgdbm-dev \
   libc6-dev \
    ffmpeg \
    python3 \
    python3-pip \
    python3-dev \
    libre2-5 \
    libb64-0d \
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*

# nginx + njs
RUN apt-get update \
 && apt-get -y install --no-install-recommends \
    curl \
    gnupg2 \
 && curl -s http://nginx.org/keys/nginx_signing.key | apt-key add - \
 && echo 'deb http://nginx.org/packages/ubuntu/ focal nginx' >> /etc/apt/sources.list \
 && apt-get update \
 && apt-get -y install --no-install-recommends \
    nginx \
    nginx-module-njs \
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*

# cython, falcon, gunicorn, grpc
RUN pip3 install -U --no-cache-dir \
    awscli==1.22.42 \
    boto3==1.20.41 \
    cython==0.29.26 \
    falcon==2.0.0 \
    gunicorn==20.1.0 \
    gevent==21.1.1 \
    requests==2.27.1 \
    grpcio==1.34.1 \
    protobuf==3.14.0 \
    sentry-sdk[falcon]==1.9.5 \
    tritonclient==2.29.0 \
    ffmpeg-python==0.2.0

# Expose gRPC and REST port
EXPOSE 8500 8501

ENV PATH="/opt/tritonserver/bin:${PATH}"
ENV LD_LIBRARY_PATH="/opt/tritonserver/lib/libtritonserver.so:${LD_LIBRARY_PATH}"

Jan 18 '23 23:01 lminer

Hi @lminer ,

I no longer have tensorflow installed

Do you still have a tensorflow model in your model repository? It looks like the TF backend library is being loaded which should only happen if there is a TF model present.

Jan 18 '23 23:01 rmccorm4

Wow that was quick! Sorry, I should have specified. Previously I had tensorflow-cpu installed via pip. This time I removed the pip package and am just using the tensorflow backend to serve a tensorflow model. tensorflow1 backend gives a similar error.

Jan 18 '23 23:01 lminer

Do you run into this issue with any TF model you produce, or just one in particular is giving issues? I would try with a very simple dummy model if any model is producing the error.

If there are no custom ops or anything, I would expect the model to just work out of the box. Maybe there is still a weird versioning issue between where the model was generated (?) and where it's being loaded (triton).

Can you try generating/saving your model in the corresponding nvcr.io/nvidia/tensorflow:22.12-py3 container? It should come with a tensorflow python package (and underlying TF library versions) that is compatible with the corresponding 22.12 triton container.

Jan 19 '23 00:01 rmccorm4

I've already tried generating it in the equivalent docker container and it didn't make a difference.

Jan 19 '23 00:01 lminer

Can you share a model that reproduces the issue and the corresponding tritonserver ... command?

Jan 19 '23 00:01 rmccorm4

I can't really share the model. Is there any more debugging I might be able to produce?

Jan 19 '23 00:01 lminer

Do you run into this issue with any TF model you produce, or just one in particular is giving issues? I would try with a very simple dummy model if any model is producing the error.

Have you tried to reproduce with a simpler model?

Jan 19 '23 00:01 rmccorm4

@rmccorm4 seems like it's crashing even with a very simple model:

class MyModel(tf.Module):
    def __init__(self):
        super().__init__()

    @tf.function(
        input_signature=(
            tf.TensorSpec(shape=[1], dtype=tf.string),
            tf.TensorSpec(shape=[1], dtype=tf.string),
        ),
        experimental_relax_shapes=True,
    )
    def predict(
        self,
        read_path: List[str],
        write_path: List[str],
    ) -> List[int]:
        tensor = tf.io.read_file(read_path[0])
        tensor = tf.io.parse_tensor(tensor, tf.float32)
        serialized_tensor = tf.io.serialize_tensor(tensor)
        tf.io.write_file(write_path[0], serialized_tensor)
        return [1]  # this is needed for autograph


model = MyModel()

tf.saved_model.save(
                model,
                "model.savedmodel",
                signatures=model.predict.get_concrete_function(),
            )

Triton server command is : tritonserver --allow-grpc=true --grpc-port=9000 --allow-http=true --allow-metrics=false --http-port=8501 --model-repository=/opt/ml/model/model_repo --l oad-model=foo --model-control-mode=explicit --disable-auto-complete-config. I'm calling triton server from a falcon server and am using gunicorn and nginx as well.

Jan 20 '23 01:01 lminer

Interestingly, it crashes even if I don't do any writing to disk in the predict method. Also, if I run a normal prediction model, there is a lot of GPU utilization before it crashes. Here's how I'm calling the it:

         from tritonclient.grpc import InferenceServerClient
         client = InferenceServerClient("localhost:9000")

          write_file_path = str(Path(tmp_dir).joinpath("foo.proto"))
          read_file_path = str(Path(tmp_dir).joinpath("bar.proto"))

          read_path = InferInput("read_path", shape=[1], datatype="BYTES")
          read_path.set_data_from_numpy(np.array([read_file_path.encode("UTF-8")]))

          write_path = InferInput("target_write", shape=[1], datatype="BYTES")
          write_path.set_data_from_numpy(np.array([write_file_path.encode("UTF-8")]))

          result = client.infer(
              "mymodel",
              [read_path, write_path],
          )

Jan 24 '23 19:01 lminer

@rmccorm4 @GuanLuo. Sorry to be a pest, but any feedback you might be able to provide would be super helpful.

Jan 26 '23 22:01 lminer

I am able to reproduce the segfault you have observed and I think there is a bug in Triton Tensorflow backend, you can find more detail below, but to unblock you, you should change your model output to return a tensor (rank > 0): return tf.constant([1]) # this is needed for autograph.

Detail: Your original model returns a scalar as output ([<tf.Tensor: shape=(), dtype=int32, numpy=1>]), Triton doesn't support scalar so the backend is supposed to check and return error regarding scalar is not supported. However, there is a bug in the backend so that the model is loaded and executed, assuming the shape of the output contains valid data.

Jan 27 '23 02:01 GuanLuo

@GuanLuo Thanks a bunch for looking into this. I figured it was something to do with the return value. Do you think the fix might make it into the next release?

Jan 27 '23 17:01 lminer

The issue will be tracked but I don't have exact timeline on when we can get to it. Just note that the fix I am referring to is not to support scalar I/O, but a proper error reporting when it detects such I/Os.

Jan 28 '23 00:01 GuanLuo

Got it. Thanks for clarifying. Any idea why it was working before 22.05?

On Fri, Jan 27, 2023 at 4:45 PM GuanLuo @.***> wrote:

The issue will be tracked but I don't have exact timeline on when we can get to it. Just note that the fix I am referring to is not to support scalar I/O, but a proper error reporting when it detects such I/Os.

— Reply to this email directly, view it on GitHub https://github.com/triton-inference-server/server/issues/4665#issuecomment-1407226847, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHDTFK4MJOPUOLHSX5VWOLWURTY5ANCNFSM536G4CLQ . You are receiving this because you were mentioned.Message ID: @.***>

Jan 28 '23 02:01 lminer

server server copied to clipboard

Server crashes on loading shared libraries

server
server copied to clipboard