server icon indicating copy to clipboard operation
server copied to clipboard

TRITON with Pytorch CPU only build not working

Open ndeep27 opened this issue 1 year ago • 15 comments

Description Triton Server with Pytorch Backend build not working for CPU_ONLY. It is expecting libraries like libcudart.so even though the build was for CPU. Below is how we invoke the build. From another issue thread - we came to know that the CPU only build was fixed in v22.04 onwards

python ./build.py --cmake-dir=$(pwd) --build-dir=/tmp/citritonbuild --endpoint=http --endpoint=grpc --enable-logging --enable-stats --enable-tracing --enable-metrics --backend=pytorch:${tritonversion} --repo-tag=common:${tritonversion} --repo-tag=core:${tritonversion} --repo-tag=backend:${tritonversion} --repo-tag=thirdparty:${tritonversion} --no-container-build --extra-core-cmake-arg=TRITON_ENABLE_GPU=OFF --extra-core-cmake-arg=TRITON_ENABLE_ONNXRUNTIME_TENSORRT=OFF --extra-backend-cmake-arg=pytorch:TRITON_ENABLE_GPU=OFF --upstream-container-version=22.04

Triton Information What version of Triton are you using? 22.04

Are you using the Triton container or did you build it yourself? Build it ourselves

To Reproduce Steps to reproduce the behavior. Mentioned above

ndeep27 avatar Jul 19 '24 16:07 ndeep27

Hi @ndeep27, Can you try with any latest stable versions to see if the issue persists? 22.04 is more than 2 years old and has been a lot of development since then.

sourabh-burnwal avatar Jul 21 '24 15:07 sourabh-burnwal

@sourabh-burnwal Even the latest version fails (24.07) without giving any specific error. For instance below is what I see in the log

gmake[3]: Leaving directory /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build/grpc/src/grpc-build' cd /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build/grpc/src/grpc-build && /usr/local/bin/cmake -E touch /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build/grpc/src/grpc-stamp/grpc-install [ 84%] Completed 'grpc' cd /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build && /usr/local/bin/cmake -E make_directory /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build/CMakeFiles cd /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build && /usr/local/bin/cmake -E touch /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build/CMakeFiles/grpc-complete cd /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build && /usr/local/bin/cmake -E touch /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build/grpc/src/grpc-stamp/grpc-done gmake[2]: Leaving directory /tmp/citritonbuild2406/tritonserver/build' [ 84%] Built target grpc gmake[1]: Leaving directory `/tmp/citritonbuild2406/tritonserver/build' gmake: *** [all] Error 2

I am not sure what is the exact error here

ndeep27 avatar Jul 21 '24 23:07 ndeep27

@sourabh-burnwal Can you please help with above?

ndeep27 avatar Jul 23 '24 19:07 ndeep27

@sourabh-burnwal Is there a CPU version of triton with pytorch released in open source?

ndeep27 avatar Jul 24 '24 14:07 ndeep27

Hi @ndeep27, is there any specific reason you want to build Triton, that too with CPU? You can always control device access while starting the container or from the model config.

sourabh-burnwal avatar Jul 24 '24 17:07 sourabh-burnwal

@sourabh-burnwal we do access it via model-config where we specify CPU but the issue is triton libraries like libtorch_cpu needs cudaart and other related cuda libraries which is leading for us to not run on CPU

ndeep27 avatar Jul 24 '24 19:07 ndeep27

@ndeep27, I have run NGC triton image on a CPU-only system without any problems. Can you share what issue you are getting?

sourabh-burnwal avatar Jul 24 '24 19:07 sourabh-burnwal

@sourabh-burnwal Can you send me the exact docker image which you used to run?

ndeep27 avatar Jul 24 '24 20:07 ndeep27

For instance I downloaded - nvcr.io/nvidia/tritonserver:24.07-pyt-python-py3 and when I ssh to the host and run the below, I see libraries like cudart linked

root@031a384b7f38:/opt/tritonserver# ldd backends/pytorch/libtorch_cpu.so linux-vdso.so.1 (0x00007fff1d4cf000) libc10.so => /opt/tritonserver/backends/pytorch/libc10.so (0x00007c0f0a949000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007c0f0a93c000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007c0f0a91c000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007c0f0a917000) libmkl_intel_lp64.so.1 => /opt/tritonserver/backends/pytorch/libmkl_intel_lp64.so.1 (0x00007c0f09a00000) libmkl_gnu_thread.so.1 => /opt/tritonserver/backends/pytorch/libmkl_gnu_thread.so.1 (0x00007c0f07a00000) libmkl_core.so.1 => /opt/tritonserver/backends/pytorch/libmkl_core.so.1 (0x00007c0eff800000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007c0f0a910000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007c0f0a829000) libgomp.so.1 => /lib/x86_64-linux-gnu/libgomp.so.1 (0x00007c0f0a7df000) libcupti.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libcupti.so.12 (0x00007c0efee00000) libmpi.so.40 => /opt/hpcx/ompi/lib/libmpi.so.40 (0x00007c0f098e1000) libcudart.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.12 (0x00007c0efea00000) libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007c0efe7d4000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007c0efe5ab000) /lib64/ld-linux-x86-64.so.2 (0x00007c0f1699b000) libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007c0f0a7d8000) libopen-rte.so.40 => /opt/hpcx/ompi/lib/libopen-rte.so.40 (0x00007c0f07941000) libopen-pal.so.40 => /opt/hpcx/ompi/lib/libopen-pal.so.40 (0x00007c0efece9000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007c0f0a7ba000)

root@031a384b7f38:/opt/tritonserver# ldd lib/libtritonserver.so linux-vdso.so.1 (0x00007ffc46cfb000) libnuma.so.1 => /lib/x86_64-linux-gnu/libnuma.so.1 (0x0000705fc1e05000) libcudart.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.12 (0x0000705fc1a00000) libdcgm.so.3 => /lib/x86_64-linux-gnu/libdcgm.so.3 (0x0000705fc16a3000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x0000705fc1de9000) libcurl.so.4 => /lib/x86_64-linux-gnu/libcurl.so.4 (0x0000705fc1d40000) libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x0000705fc15ff000) libxml2.so.2 => /lib/x86_64-linux-gnu/libxml2.so.2 (0x0000705fc141d000) libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x0000705fc11f1000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000705fc110a000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x0000705fc1d20000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000705fc0ee1000) /lib64/ld-linux-x86-64.so.2 (0x0000705fc32b1000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x0000705fc1d19000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x0000705fc1d14000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x0000705fc1d0f000) libnghttp2.so.14 => /lib/x86_64-linux-gnu/libnghttp2.so.14 (0x0000705fc1ce5000) libidn2.so.0 => /lib/x86_64-linux-gnu/libidn2.so.0 (0x0000705fc1cc4000) librtmp.so.1 => /lib/x86_64-linux-gnu/librtmp.so.1 (0x0000705fc0ec2000) libssh.so.4 => /lib/x86_64-linux-gnu/libssh.so.4 (0x0000705fc0e55000) libpsl.so.5 => /lib/x86_64-linux-gnu/libpsl.so.5 (0x0000705fc0e41000) libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x0000705fc09fd000) libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x0000705fc09a9000) libldap-2.5.so.0 => /lib/x86_64-linux-gnu/libldap-2.5.so.0 (0x0000705fc094a000) liblber-2.5.so.0 => /lib/x86_64-linux-gnu/liblber-2.5.so.0 (0x0000705fc0939000) libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x0000705fc086a000) libbrotlidec.so.1 => /lib/x86_64-linux-gnu/libbrotlidec.so.1 (0x0000705fc1cb2000) libicuuc.so.70 => /lib/x86_64-linux-gnu/libicuuc.so.70 (0x0000705fc066f000) liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x0000705fc0644000) libunistring.so.2 => /lib/x86_64-linux-gnu/libunistring.so.2 (0x0000705fc049a000) libgnutls.so.30 => /lib/x86_64-linux-gnu/libgnutls.so.30 (0x0000705fc02af000) libhogweed.so.6 => /lib/x86_64-linux-gnu/libhogweed.so.6 (0x0000705fc0267000) libnettle.so.8 => /lib/x86_64-linux-gnu/libnettle.so.8 (0x0000705fc0221000) libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x0000705fc019f000) libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x0000705fc00d2000) libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x0000705fc00a3000) libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x0000705fc009d000) libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x0000705fc008f000) libsasl2.so.2 => /lib/x86_64-linux-gnu/libsasl2.so.2 (0x0000705fc0074000) libbrotlicommon.so.1 => /lib/x86_64-linux-gnu/libbrotlicommon.so.1 (0x0000705fc0051000) libicudata.so.70 => /lib/x86_64-linux-gnu/libicudata.so.70 (0x0000705fbe431000) libp11-kit.so.0 => /lib/x86_64-linux-gnu/libp11-kit.so.0 (0x0000705fbe2f6000) libtasn1.so.6 => /lib/x86_64-linux-gnu/libtasn1.so.6 (0x0000705fbe2de000) libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x0000705fbe2d7000) libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x0000705fbe2c3000) libffi.so.8 => /lib/x86_64-linux-gnu/libffi.so.8 (0x0000705fbe2b4000)

On CPUs these are not available

ndeep27 avatar Jul 24 '24 20:07 ndeep27

We also cannot use these docker images directly since our OS is a variant of RHEL. So we have to build triton from source for our OS.

ndeep27 avatar Jul 24 '24 20:07 ndeep27

@ndeep27 I see. Those files get shipped with the docker image.

Can you start a container with that 24.07 docker image without giving it gpu device access, then try loading your pytorch model after specifying device as cpu in config.pbtxt.

Regarding your use-case of running this in RHEL/CentOS based system. I think, as long as you have docker installed and configured, you should be able to run it.

sourabh-burnwal avatar Jul 24 '24 21:07 sourabh-burnwal

@sourabh-burnwal What i mean is that we cannot directly use that docker image (these are built for Ubuntu but internally we have to build on RHEL due to security constrainsts). We have to build it from source. I am guessing these containers were also built when building from source. Right? We needed a way to build from source so that it works on CPU too

root@0d33724e583c:/opt/tritonserver# cat /etc/os-release PRETTY_NAME="Ubuntu 22.04.4 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.4 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy

ndeep27 avatar Jul 24 '24 21:07 ndeep27

If I copy the files (under /opt/tritonserver) from this docker image and add it to our custom inference pipeline - will that work?

ndeep27 avatar Jul 24 '24 21:07 ndeep27

If I copy the files (under /opt/tritonserver) from this docker image and add it to our custom inference pipeline - will that work?

I don't think this will work as the build might also contain system dependencies. I can try to reproduce your issue but that will take some time as I am currently using ubuntu.

sourabh-burnwal avatar Jul 30 '24 17:07 sourabh-burnwal

I encountered the same problem, only building a CPU image, the command is:

python3 build.py --enable-logging --endpoint http --endpoint grpc --backend python

I built it in the latest main branch code, version v2.50,

-- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/port_platform.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/string_util.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/sync.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/sync_abseil.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/sync_custom.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/sync_generic.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/sync_posix.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/sync_windows.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/thd_id.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/time.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/workaround_list.h -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/libgrpc_authorization_provider.a -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc++/impl/codegen/config_protobuf.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpcpp/impl/codegen/config_protobuf.h -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/libgrpc_plugin_support.a -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpcpp/ext/channelz_service_plugin.h -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/libgrpcpp_channelz.a -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/libupb.a -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/bin/grpc_cpp_plugin -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/bin/grpc_csharp_plugin -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/bin/grpc_node_plugin -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/bin/grpc_objective_c_plugin -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/bin/grpc_php_plugin -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/bin/grpc_python_plugin -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/bin/grpc_ruby_plugin -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/gRPCTargets.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/gRPCTargets-release.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/gRPCPluginTargets.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/gRPCPluginTargets-release.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/gRPCConfig.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/gRPCConfigVersion.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/modules/Findc-ares.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/modules/Findre2.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/modules/Findsystemd.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/share/grpc/roots.pem -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/pkgconfig/gpr.pc -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/pkgconfig/grpc.pc -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/pkgconfig/grpc_unsecure.pc -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/pkgconfig/grpc++.pc -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/pkgconfig/grpc++_unsecure.pc [ 87%] Completed 'grpc' [ 87%] Built target grpc gmake: *** [Makefile:136: all] Error 2 error: build failed

bitszhang avatar Oct 18 '24 01:10 bitszhang