fastertransformer_backend icon indicating copy to clipboard operation
fastertransformer_backend copied to clipboard

compile my own backend, libtriton_fastertransformer.so undefined symbol:

Open A-ML-ER opened this issue 1 year ago • 7 comments

Description

UNAVAILABLE: Not found: unable to load shared library: /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so: undefined symbol: _ZN22ParallelGptTritonModelI6__halfE8toStringB5c

I don't change ParallelGptTritonModel related code.
But when start Triton server, it always fail.

Reproduced Steps

steps to reproduce 
1.
docker build --rm   \
    --build-arg TRITON_VERSION=${CONTAINER_VERSION}   \
    -t ${TRITON_DOCKER_IMAGE} \
    -f docker/Dockerfile \
    .

2. start with 
CUDA_VISIBLE_DEVICES=0,1 /opt/tritonserver/bin/tritonserver  --model-repository=./triton-model-store/gptj/ &

2. build with
cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/tritonserver/backends/fastertransformer/ -DBUILD_PYT=ON -DBUILD_MULTI_GPU=ON ..
make -j 32 install

A-ML-ER avatar Apr 04 '23 13:04 A-ML-ER

I0404 14:43:41.957637 63955 server.cc:594] +-------------------+---------+-----------------------------------------------------------------------------------------------------+ | Model | Version | Status | +-------------------+---------+-----------------------------------------------------------------------------------------------------+ | fastertransformer | 1 | UNAVAILABLE: Not found: unable to load shared library: /opt/tritonserver/backends/fastertransformer | | | | /libtriton_fastertransformer.so: undefined symbol: _ZN22ParallelGptTritonModelI6__halfE8toStringB5c

A-ML-ER avatar Apr 04 '23 14:04 A-ML-ER

Do you change any code?

byshiue avatar Apr 06 '23 06:04 byshiue

I had the same problem , yes, I change code, just add support a new model. This symbol is not found in libtriton_fastertransformer.so:

nm -D libtriton_fastertransformer.so | grep ParallelGptTritonModel | grep toString

But it's found in libtransformer-shared.so, I see the same without modifying the code, but no error is reported.

how to fix it?

zhaohb avatar Apr 12 '23 12:04 zhaohb

Do you add the new model in https://github.com/NVIDIA/FasterTransformer/blob/main/CMakeLists.txt#L317?

byshiue avatar Apr 13 '23 00:04 byshiue

@byshiue yes, I had add the new model in transformer-shared. and I had add some code in src/fastertransformer/triton_backend, such as tritonmodel and tritonmodelinstance. and I also add my new model in fastertransfomer_backend repo src/libfastertransformer.cc file. The code now feels fine.

zhaohb avatar Apr 13 '23 01:04 zhaohb

@A-ML-ER Have you solved the problem?

zhaohb avatar Apr 13 '23 01:04 zhaohb

@byshiue I have same problem https://github.com/triton-inference-server/fastertransformer_backend#rebuilding-fastertransformer-backend-optional

cmake \
      -D CMAKE_EXPORT_COMPILE_COMMANDS=1 \
      -D CMAKE_BUILD_TYPE=Release \
      -D ENABLE_FP8=OFF \
      -D BUILD_MULTI_GPU=ON \
      -D BUILD_PYT=ON \
      -D SM=80 \
      -D CMAKE_INSTALL_PREFIX=/opt/tritonserver \
      -D TRITON_COMMON_REPO_TAG="r${NVIDIA_TRITON_SERVER_VERSION}" \
      -D TRITON_CORE_REPO_TAG="r${NVIDIA_TRITON_SERVER_VERSION}" \
      -D TRITON_BACKEND_REPO_TAG="r${NVIDIA_TRITON_SERVER_VERSION}" \
      ..

I need to use BUILD_PYT=ON.

But, same error occured.

 UNAVAILABLE: Not found: unable to load shared library: /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so: undefined symbol: _ZN22ParallelGptTritonModelI6__halfE8toStringB5cxx1

lkm2835 avatar May 18 '23 00:05 lkm2835