server Build a custom python backend environment for old fashion model. How to use specific CUDA version in conda environment?

Description

I have a model built on tensorflow v1.15 (cuda=10.0, cudnn=7.4.1). I would like to register this model to current triton server (v22.03), the default CUDA version is 11.6. I try to build a custom python backend environment to run it.

Triton Information What version of Triton are you using? Are you using the Triton container or did you build it yourself?

server_version = 2.20.0
Yes, I use Triton container, the version is nvcr.io/nvidia/tritonserver:22.03-py3

To Reproduce Steps to reproduce the behavior.

Building Custom Python Backend Stub. Then verify it with ldd triton_python_backend_stub, I get:

        librt.so.1 => /usr/lib/x86_64-linux-gnu/librt.so.1 (0x00007fbd60256000)
        libarchive.so.13 => /usr/lib/x86_64-linux-gnu/libarchive.so.13 (0x00007fbd60193000)
        libcudart.so.11.0 => /usr/local/cuda/lib64/libcudart.so.11.0 (0x00007fbd5feef000)
        libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2 (0x00007fbd5fee9000)
        libpython3.6m.so.1.0 => /root/anaconda3/envs/python-36/lib/libpython3.6m.so.1.0 (0x00007fbd5fba2000)
        libstdc++.so.6 => /root/anaconda3/envs/python-36/lib/libstdc++.so.6 (0x00007fbd5f98c000)
        libgcc_s.so.1 => /root/anaconda3/envs/python-36/lib/libgcc_s.so.1 (0x00007fbd5f972000)
        libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fbd5f94f000)
        libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007fbd5f75d000)
        libnettle.so.7 => /usr/lib/x86_64-linux-gnu/libnettle.so.7 (0x00007fbd5f723000)
        libacl.so.1 => /usr/lib/x86_64-linux-gnu/libacl.so.1 (0x00007fbd5f718000)
        liblzma.so.5 => /usr/lib/x86_64-linux-gnu/liblzma.so.5 (0x00007fbd5f6ed000)
        libzstd.so.1 => /usr/lib/x86_64-linux-gnu/libzstd.so.1 (0x00007fbd5f644000)
        liblz4.so.1 => /usr/lib/x86_64-linux-gnu/liblz4.so.1 (0x00007fbd5f623000)
        libbz2.so.1.0 => /usr/lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007fbd5f610000)
        libz.so.1 => /usr/lib/x86_64-linux-gnu/libz.so.1 (0x00007fbd5f5f4000)
        libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00007fbd5f43a000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fbd602d3000)
        libutil.so.1 => /usr/lib/x86_64-linux-gnu/libutil.so.1 (0x00007fbd5f433000)
        libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007fbd5f2e4000)
        libicuuc.so.66 => /usr/lib/x86_64-linux-gnu/libicuuc.so.66 (0x00007fbd5f0fe000)
        libicudata.so.66 => /usr/lib/x86_64-linux-gnu/libicudata.so.66 (0x00007fbd5d63d000)

Packaging the Conda Environment

I run a simple test program with the custom python /root/anaconda3/envs/python-36/bin/python It works.

However, I load model in to trtion inference server and got the following error. Successfully load corresponding dynamic library but run into fail at the end.

infer_server    | I0809 03:15:36.218548 1 python.cc:1618] Using Python execution env /models/segmentation1/python36.tar.gz
infer_server    | I0809 03:15:36.218605 1 python.cc:1640] Input tensors can be both in CPU and GPU. FORCE_CPU_ONLY_INPUT_TENSORS is off.
infer_server    | I0809 03:15:36.221956 1 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: segmentation1 (GPU device 0)
infer_server    | Using TensorFlow backend.
infer_server    | 2022-08-09 03:16:03,424 - segmentation1 - INFO - tf module = /tmp/python_env_R6C0Zi/0/lib/python3.6/site-packages/tensorflow/__init__.py
infer_server    | 2022-08-09 03:16:03,424 - segmentation1 - INFO - tf module = /tmp/python_env_R6C0Zi/0/lib/python3.6/site-packages/tensorflow/__init__.py
infer_server    | 2022-08-09 03:16:03,424 - segmentation1 - INFO - tensorflow version = 1.15.2
infer_server    | 2022-08-09 03:16:03,424 - segmentation1 - INFO - tensorflow version = 1.15.2
infer_server    | 2022-08-09 03:16:03,424 - segmentation1 - INFO - cuda version = 10
infer_server    | 2022-08-09 03:16:03,424 - segmentation1 - INFO - cuda version = 10
infer_server    | 2022-08-09 03:16:03,424 - segmentation1 - INFO - cudnn version = 7
infer_server    | 2022-08-09 03:16:03,424 - segmentation1 - INFO - cudnn version = 7
infer_server    | 2022-08-09 03:16:03.426190: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
infer_server    | 2022-08-09 03:16:03.431096: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:03.431325: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
infer_server    | name: Quadro RTX 6000 major: 7 minor: 5 memoryClockRate(GHz): 1.77
infer_server    | pciBusID: 0000:13:00.0
infer_server    | 2022-08-09 03:16:03.431593: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
infer_server    | 2022-08-09 03:16:03.432881: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
infer_server    | 2022-08-09 03:16:03.434198: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
infer_server    | 2022-08-09 03:16:03.434536: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
infer_server    | 2022-08-09 03:16:03.436053: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
infer_server    | 2022-08-09 03:16:03.437196: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
infer_server    | 2022-08-09 03:16:03.440340: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
infer_server    | 2022-08-09 03:16:03.440535: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:03.440794: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:03.440943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
infer_server    | 2022-08-09 03:16:03,441 - segmentation1 - INFO - GPU info [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
infer_server    | 2022-08-09 03:16:03,441 - segmentation1 - INFO - GPU info [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
infer_server    | 2022-08-09 03:16:03,441 - segmentation1 - INFO - set GPU memory growth
infer_server    | 2022-08-09 03:16:03,441 - segmentation1 - INFO - set GPU memory growth
infer_server    | 2022-08-09 03:16:03,441 - segmentation1 - INFO - set GPU memory growth222
infer_server    | 2022-08-09 03:16:03,441 - segmentation1 - INFO - set GPU memory growth222
infer_server    | 2022-08-09 03:16:03.441798: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
infer_server    | 2022-08-09 03:16:03.451612: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2394375000 Hz
infer_server    | 2022-08-09 03:16:03.451911: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5620bfefa1d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
infer_server    | 2022-08-09 03:16:03.451929: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
infer_server    | 2022-08-09 03:16:03.629311: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:03.629566: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5620c0234580 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
infer_server    | 2022-08-09 03:16:03.629592: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Quadro RTX 6000, Compute Capability 7.5
infer_server    | 2022-08-09 03:16:03.629803: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:03.629958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
infer_server    | name: Quadro RTX 6000 major: 7 minor: 5 memoryClockRate(GHz): 1.77
infer_server    | pciBusID: 0000:13:00.0
infer_server    | 2022-08-09 03:16:03.630015: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
infer_server    | 2022-08-09 03:16:03.630027: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
infer_server    | 2022-08-09 03:16:03.630037: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
infer_server    | 2022-08-09 03:16:03.630047: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
infer_server    | 2022-08-09 03:16:03.630057: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
infer_server    | 2022-08-09 03:16:03.630067: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
infer_server    | 2022-08-09 03:16:03.630078: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
infer_server    | 2022-08-09 03:16:03.630152: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:03.630338: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:03.630471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
infer_server    | 2022-08-09 03:16:03.630826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
infer_server    | 2022-08-09 03:16:03.630843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
infer_server    | 2022-08-09 03:16:03.630849: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
infer_server    | 2022-08-09 03:16:03.630976: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:03.631179: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:03.631359: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 117 MB memory) -> physical GPU (device: 0, name: Quadro RTX 6000, pci bus id: 0000:13:00.0, compute capability: 7.5)
infer_server    | 2022-08-09 03:16:03.635199: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:03.635385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
infer_server    | name: Quadro RTX 6000 major: 7 minor: 5 memoryClockRate(GHz): 1.77
infer_server    | pciBusID: 0000:13:00.0
infer_server    | 2022-08-09 03:16:03.635434: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
infer_server    | 2022-08-09 03:16:03.635449: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
infer_server    | 2022-08-09 03:16:03.635460: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
infer_server    | 2022-08-09 03:16:03.635470: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
infer_server    | 2022-08-09 03:16:03.635482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
infer_server    | 2022-08-09 03:16:03.635493: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
infer_server    | 2022-08-09 03:16:03.635504: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
infer_server    | 2022-08-09 03:16:03.635580: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:03.635776: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:03,636 - segmentation1 - INFO - segmentation1
infer_server    | 2022-08-09 03:16:03,636 - segmentation1 - INFO - segmentation1
infer_server    | 2022-08-09 03:16:03,636 - segmentation1 - INFO - Logger Set Up!
infer_server    | 2022-08-09 03:16:03,636 - segmentation1 - INFO - Logger Set Up!
infer_server    | 2022-08-09 03:16:03.635916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
infer_server    | 2022-08-09 03:16:03.635936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
infer_server    | 2022-08-09 03:16:03.635942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
infer_server    | 2022-08-09 03:16:03.635948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
infer_server    | 2022-08-09 03:16:03.636033: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:03.636221: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:03.636364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 117 MB memory) -> physical GPU (device: 0, name: Quadro RTX 6000, pci bus id: 0000:13:00.0, compute capability: 7.5)
infer_server    | 2022-08-09 03:16:07,439 - segmentation1 - INFO - load model successfully
infer_server    | 2022-08-09 03:16:07,439 - segmentation1 - INFO - load model successfully
infer_server    | 2022-08-09 03:16:07,439 - segmentation1 - INFO - existed?
infer_server    | 2022-08-09 03:16:07,439 - segmentation1 - INFO - existed?
infer_server    | 2022-08-09 03:16:07,439 - segmentation1 - INFO - True
infer_server    | 2022-08-09 03:16:07,439 - segmentation1 - INFO - True
infer_server    | 2022-08-09 03:16:07,439 - segmentation1 - INFO - <models.model.MaskRCNN object at 0x7f3e391650b8>
infer_server    | 2022-08-09 03:16:07,439 - segmentation1 - INFO - <models.model.MaskRCNN object at 0x7f3e391650b8>
infer_server    | 2022-08-09 03:16:11.386943: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:11.387173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
infer_server    | name: Quadro RTX 6000 major: 7 minor: 5 memoryClockRate(GHz): 1.77
infer_server    | pciBusID: 0000:13:00.0
infer_server    | 2022-08-09 03:16:11.387259: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
infer_server    | 2022-08-09 03:16:11.387274: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
infer_server    | 2022-08-09 03:16:11.387305: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
infer_server    | 2022-08-09 03:16:11.387337: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
infer_server    | 2022-08-09 03:16:11.387350: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
infer_server    | 2022-08-09 03:16:11.387362: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
infer_server    | 2022-08-09 03:16:11.387376: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
infer_server    | 2022-08-09 03:16:11.387481: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:11.387980: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:11.388155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
infer_server    | 2022-08-09 03:16:11.388187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
infer_server    | 2022-08-09 03:16:11.388194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
infer_server    | 2022-08-09 03:16:11.388201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
infer_server    | 2022-08-09 03:16:11.388371: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:11.388603: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
infer_server    | 2022-08-09 03:16:11.388956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 117 MB memory) -> physical GPU (device: 0, name: Quadro RTX 6000, pci bus id: 0000:13:00.0, compute capability: 7.5)
infer_server    | 2022-08-09 03:16:13.364651: F ./tensorflow/core/kernels/random_op_gpu.h:227] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: invalid configuration argument

Expected behavior Could someone else to help me address the problem? Do I need to re-build triton_python_backend_stub to address the CUDA to specific version? How to do it? I am not familiar with cmake >.<

I guess that I cannot start service successfully is due to the CUDA version in triton_python_backend_stub is address to CUDA 11.0 which is incompatible with the model needed. libcudart.so.11.0 => /usr/local/cuda/lib64/libcudart.so.11.0 (0x00007fbd5feef000)

Aug 09 '22 03:08 jennyHsiao

Hi @jennyHsiao, I think the CUDA library used in the triton_python_backend_stub can be different from the CUDA library used in the TF Python package. Are you installing CUDA using conda install -c anaconda cudatoolkit=11.2? If the CUDA installation is part of the conda environment, I think conda-pack should store it in the tar file.

Aug 10 '22 15:08 Tabrizian

I install conda install -c anaconda cudatoolkit=10.0 and I check that it is stored in the tar file. I run my program again rather than using the simple test as above mentioned. It report the error as below:

2022-08-10 07:03:01.659267: E tensorflow/stream_executor/cuda/http://cuda_dnn.cc:319] Loaded runtime CuDNN library: 7.4.1 but source was compiled with: 7.6.0. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.

Yes, I tried changing my CuDNN version, probably the error is due to something wrong with my changes.

I will keep fix it and update later. Thank you.

Aug 12 '22 01:08 jennyHsiao

@Tabrizian found that the compatible CuDNN version is 7.6.5 rather than 7.4.1 or 7.6.0. After I reinstall by using conda install cudnn==7.6.5. I can run a test program in my custom env on triton image.

As I initialize a triton server, I load my model. I find that my custom env is mount on /tmp/python_env_9Zcf2g/0/bin/python. then I run /tmp/python_env_9Zcf2g/0/bin/python test_program.py. It can run successfully.

However, my initialize of TritonPythonModel will stuck at the line self.model.load_weights( '/models/segmentation1/1/models/mask_rcnn_coco_0099.h5', by_name=True) and report the following error:

infer_server | 2022-08-15 06:31:59.255701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22104 MB memory) -> physical GPU (device: 0, name: Quadro RTX 6000, pci bus id: 0000:13:00.0, compute capability: 7.5) infer_server | 2022-08-15 06:32:01.555046: F ./tensorflow/core/kernels/random_op_gpu.h:227] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: invalid configuration argument

Why the model can load the .h5 weights in the custom env but cannot load in TritonPythonModel?

class TritonPythonModel:
    def initialize(self, args):

        # get path
        self.module_path = args['model_repository']
        self.version_path = os.path.join(self.module_path, args['model_version'])

        # define logger
        logging_config_path = os.path.join(
            self.version_path, 'logging_config.yaml')
        with open(logging_config_path, 'r') as f:
            config = yaml.safe_load(f.read())
            logging.config.dictConfig(config)
            logging.captureWarnings(True)
        self.logger = logging.getLogger(args['model_name'])

        # GPU growth        
        gpus = tf.config.experimental.list_physical_devices('GPU')
        
        self.logger.info(f'GPU info {gpus}')
        
        if gpus:
            try:
                # Currently, memory growth needs to be the same across GPUs
                for gpu in gpus:                    
                    tf.config.experimental.set_memory_growth(gpu, True)                    
                logical_gpus = tf.config.experimental.list_logical_devices('GPU')
                self.logger.info(len(gpus), "Physical GPUs,",
                                 len(logical_gpus), "Logical GPUs")
            except RuntimeError as e:
                # Memory growth must be set before GPUs have been initialized
                self.logger.info(e)

        # You must parse model_config. JSON string is not parsed here
        self.model_config = json.loads(args['model_config'])
        
        
        self.logger.info(args['model_name'])
        self.logger.info('Logger Set Up!')

        self.model = modellib.MaskRCNN(mode="inference",
                                       config=InferenceConfig(),
                                       model_dir="")

        self.model.load_weights(
            '/models/segmentation1/1/models/mask_rcnn_coco_0099.h5', by_name=True)
        self.logger.info('load weight successfully')

Aug 15 '22 06:08 jennyHsiao

Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue

Nov 22 '22 03:11 jbkyang-nvi

server server copied to clipboard

Build a custom python backend environment for old fashion model. How to use specific CUDA version in conda environment?

server
server copied to clipboard