server
server copied to clipboard
GRPC: unable to provide 'prob' in GPU, will use CPU
I0902 06:24:42.600067 1 tensorrt.cc:5543] model 3dtest, instance 3dtest, executing 1 requests I0902 06:24:42.600127 1 tensorrt.cc:1614] TRITONBACKEND_ModelExecute: Issuing 3dtest with 1 requests I0902 06:24:42.600156 1 tensorrt.cc:1673] TRITONBACKEND_ModelExecute: Running 3dtest with 1 requests I0902 06:24:42.600205 1 tensorrt.cc:2803] Optimization profile default [0] is selected for 3dtest I0902 06:24:42.600272 1 pinned_memory_manager.cc:161] pinned memory allocation: size 4915200, addr 0x7feccc000090 I0902 06:24:42.601566 1 tensorrt.cc:2177] Context with profile default [0] is being executed for 3dtest I0902 06:24:42.603102 1 infer_response.cc:167] add response output: output: prob, type: FP32, shape: [1,6001,1,1] I0902 06:24:42.603153 1 grpc_server.cc:2581] GRPC: unable to provide 'prob' in GPU, will use CPU I0902 06:24:42.603178 1 grpc_server.cc:2592] GRPC: using buffer for 'prob', size: 24004, addr: 0x7feb4cedf4b0 I0902 06:24:42.603206 1 pinned_memory_manager.cc:161] pinned memory allocation: size 24004, addr 0x7feccc4b00a0 I0902 06:24:42.606473 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED I0902 06:24:42.606546 1 grpc_server.cc:2712] GRPC free: size 24004, addr 0x7feb4cedf4b0 I0902 06:24:42.606788 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete I0902 06:24:42.606856 1 tensorrt.cc:2660] TRITONBACKEND_ModelExecute: model 3dtest released 1 requests I0902 06:24:42.606842 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE I0902 06:24:42.606890 1 pinned_memory_manager.cc:190] pinned memory deallocation: addr 0x7feccc4b00a0 I0902 06:24:42.606946 1 grpc_server.cc:2502] Done for ModelInferHandler, 0I0902 06:24:42.606954 1 pinned_memory_manager.cc:190] pinned memory deallocation: addr 0x7feccc000090
I0902 06:24:42.619659 1 grpc_server.cc:270] Process for ModelReady, rpc_ok=1, 10 step START I0902 06:24:42.619720 1 grpc_server.cc:225] Ready for RPC 'ModelReady', 11 I0902 06:24:42.619829 1 grpc_server.cc:270] Process for ModelReady, rpc_ok=1, 10 step COMPLETE I0902 06:24:42.619859 1 grpc_server.cc:411] Done for ModelReady, 10
test time: 0.7890005111694336 0.4470024108886719 0.4459996223449707 0.45600080490112305 0.44300055503845215 0.45900774002075195 0.44899868965148926 0.4490010738372803 ....
tritonserver 22.06
nvidia Tesla T4
nvidia/cuda 11.0
docker run --gpus all --rm --ipc=host --shm-size=1g --ulimit memlock=-1 --ulimit stack=134217728 -p40001:8000 -p40002:8001 -p8002:8002 -v /home/dell/chen/triton-model/:/models -v /home/dell/chen/plugins:/plugins --env LD_PRELOAD=/plugins/v5.so --env LD_PRELOAD=/plugins/v6.so nvcr.io/nvidia/tritonserver:22.06-py3 tritonserver --model-repository=/models --strict-model-config=false --log-verbose 1 --model-control-mode=explicit --load-model=* --allow-metrics=false
config.pbtxt
name: "3dtest"
platform: "tensorrt_plan"
max_batch_size: 8
input: {
name: "data"
data_type: TYPE_FP32
dims: 3
dims: 640
dims: 640
}
output:{
name: "prob",
data_type: TYPE_FP32
dims: 6001
dims: 1
dims: 1
}
reference : yolov5- 5.0 git clone -b v5.0 https://github.com/ultralytics/yolov5.git git clone -b yolov5-v5.0 https://github.com/wang-xinyu/tensorrtx.git
HI @Chenhaolin6, could you explain your issue a little more please? We recommend using our template to help out. Thanks.
If your question is about the the fact that 'prob' is in CPU, not GPU, that's expected. Your model likely expect the output in GPU, but you're sending it over the network with GRPC. It is necessarily on the CPU because those software stacks are written to use CPU memory. So it is completely expected that data coming over the network needs to be transferred to the GPU.
If you are writing a custom backend and/or writing a client application, there are ways to keep the data on the GPU. For example, using CUDA shared memory to put the tensors into GPU.
Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue
using CUDA shared memory to put the tensors into GPU.
can you give some examples for "using CUDA shared memory to put the tensors into GPU." Thanks!
using CUDA shared memory to put the tensors into GPU.
can you give some examples for "using CUDA shared memory to put the tensors into GPU." Thanks!
This question is not related to the issue. Please create a new issue for future questions. That being said, please refer to client README for more information on cuda shared memory