server icon indicating copy to clipboard operation
server copied to clipboard

GRPC: unable to provide 'prob' in GPU, will use CPU

Open Chenhaolin6 opened this issue 2 years ago • 2 comments

I0902 06:24:42.600067 1 tensorrt.cc:5543] model 3dtest, instance 3dtest, executing 1 requests I0902 06:24:42.600127 1 tensorrt.cc:1614] TRITONBACKEND_ModelExecute: Issuing 3dtest with 1 requests I0902 06:24:42.600156 1 tensorrt.cc:1673] TRITONBACKEND_ModelExecute: Running 3dtest with 1 requests I0902 06:24:42.600205 1 tensorrt.cc:2803] Optimization profile default [0] is selected for 3dtest I0902 06:24:42.600272 1 pinned_memory_manager.cc:161] pinned memory allocation: size 4915200, addr 0x7feccc000090 I0902 06:24:42.601566 1 tensorrt.cc:2177] Context with profile default [0] is being executed for 3dtest I0902 06:24:42.603102 1 infer_response.cc:167] add response output: output: prob, type: FP32, shape: [1,6001,1,1] I0902 06:24:42.603153 1 grpc_server.cc:2581] GRPC: unable to provide 'prob' in GPU, will use CPU I0902 06:24:42.603178 1 grpc_server.cc:2592] GRPC: using buffer for 'prob', size: 24004, addr: 0x7feb4cedf4b0 I0902 06:24:42.603206 1 pinned_memory_manager.cc:161] pinned memory allocation: size 24004, addr 0x7feccc4b00a0 I0902 06:24:42.606473 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED I0902 06:24:42.606546 1 grpc_server.cc:2712] GRPC free: size 24004, addr 0x7feb4cedf4b0 I0902 06:24:42.606788 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete I0902 06:24:42.606856 1 tensorrt.cc:2660] TRITONBACKEND_ModelExecute: model 3dtest released 1 requests I0902 06:24:42.606842 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE I0902 06:24:42.606890 1 pinned_memory_manager.cc:190] pinned memory deallocation: addr 0x7feccc4b00a0 I0902 06:24:42.606946 1 grpc_server.cc:2502] Done for ModelInferHandler, 0I0902 06:24:42.606954 1 pinned_memory_manager.cc:190] pinned memory deallocation: addr 0x7feccc000090

I0902 06:24:42.619659 1 grpc_server.cc:270] Process for ModelReady, rpc_ok=1, 10 step START I0902 06:24:42.619720 1 grpc_server.cc:225] Ready for RPC 'ModelReady', 11 I0902 06:24:42.619829 1 grpc_server.cc:270] Process for ModelReady, rpc_ok=1, 10 step COMPLETE I0902 06:24:42.619859 1 grpc_server.cc:411] Done for ModelReady, 10

test time: 0.7890005111694336 0.4470024108886719 0.4459996223449707 0.45600080490112305 0.44300055503845215 0.45900774002075195 0.44899868965148926 0.4490010738372803 ....

tritonserver 22.06
nvidia Tesla T4
nvidia/cuda 11.0
docker run --gpus all --rm --ipc=host --shm-size=1g --ulimit memlock=-1 --ulimit stack=134217728 -p40001:8000 -p40002:8001 -p8002:8002 -v /home/dell/chen/triton-model/:/models -v /home/dell/chen/plugins:/plugins --env LD_PRELOAD=/plugins/v5.so --env LD_PRELOAD=/plugins/v6.so nvcr.io/nvidia/tritonserver:22.06-py3 tritonserver --model-repository=/models --strict-model-config=false --log-verbose 1 --model-control-mode=explicit --load-model=* --allow-metrics=false config.pbtxt

name: "3dtest"
platform: "tensorrt_plan"
max_batch_size: 8
input: {
        name: "data"
        data_type: TYPE_FP32
        dims: 3
        dims: 640
        dims: 640

    }
output:{
        name: "prob",
        data_type: TYPE_FP32
        dims: 6001
        dims: 1
        dims: 1
    }

reference : yolov5- 5.0 git clone -b v5.0 https://github.com/ultralytics/yolov5.git git clone -b yolov5-v5.0 https://github.com/wang-xinyu/tensorrtx.git

Chenhaolin6 avatar Sep 02 '22 06:09 Chenhaolin6

HI @Chenhaolin6, could you explain your issue a little more please? We recommend using our template to help out. Thanks.

nv-kmcgill53 avatar Sep 02 '22 16:09 nv-kmcgill53

If your question is about the the fact that 'prob' is in CPU, not GPU, that's expected. Your model likely expect the output in GPU, but you're sending it over the network with GRPC. It is necessarily on the CPU because those software stacks are written to use CPU memory. So it is completely expected that data coming over the network needs to be transferred to the GPU.

If you are writing a custom backend and/or writing a client application, there are ways to keep the data on the GPU. For example, using CUDA shared memory to put the tensors into GPU.

dyastremsky avatar Sep 02 '22 19:09 dyastremsky

Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue

jbkyang-nvi avatar Nov 22 '22 03:11 jbkyang-nvi

using CUDA shared memory to put the tensors into GPU.

can you give some examples for "using CUDA shared memory to put the tensors into GPU." Thanks!

dongteng avatar Dec 22 '23 10:12 dongteng

using CUDA shared memory to put the tensors into GPU.

can you give some examples for "using CUDA shared memory to put the tensors into GPU." Thanks!

This question is not related to the issue. Please create a new issue for future questions. That being said, please refer to client README for more information on cuda shared memory

jbkyang-nvi avatar Jan 11 '24 01:01 jbkyang-nvi