Tensorflow-TensorRT icon indicating copy to clipboard operation
Tensorflow-TensorRT copied to clipboard

read_pb is slow

Open fugjo16 opened this issue 6 years ago • 12 comments

Dear author,

It's a great project, and the result is good!

but when I ran yolov3 with TensorRT on TX2, it took a long time (about 10~20 mins) to run read_pb_return_tensors(). Is this right? I'm wondering whether I did something wrong ...

Thanks

fugjo16 avatar Jan 19 '19 08:01 fugjo16

Hi,

Do you: (i) run all the block 2 code of this code file, or (ii) only run function read_pb_graph("./model/YOLOv3/yolov3_gpu_nms.pb? If (i), yes, it takes longer time since you also perform TensorRT optimization. But, later after you store the trt_model.pb, you can just do similar to (ii) to call your stored trt_model.pb, and it only takes few seconds (also depends on your GPU). By the way, can you provide how much improvement in term of FPS after TRT optimization? And also what GPU you use. I am also curious about that.

ardianumam avatar Jan 19 '19 09:01 ardianumam

Hi @ardianumam ,

The situation is (ii), it will need about 15 mins to load the model, and I run this code on Jetson TX2, but with 3rd party carrier board. After the loading finished, the fps can be about 9 fps, and about 4 fps without TensorRT optimization. I think maybe the problem is caused by the 3rd party carrier board or different version of packages, I'll check it. Thanks for your reply.

fugjo16 avatar Jan 21 '19 03:01 fugjo16

@fugjo16 : Do you convert the frozen_model.pb to TRT_model.pb in Desktop, then, you use it in Jetson TX2? Because I ever do the similar, and yes, it takes very long time even only to load the TRT_model.pb. And actually, such workflow is not proper, since TensorRT optimization generates an optimized model specifically for the machine we used to run the TensorRT optimization.

If not, I wonder that you can convert frozen_model.pb to TRT_model.pb in Jetson TX2, cz I ever try it several times and it always runs out memory. -.-

ardianumam avatar Jan 21 '19 03:01 ardianumam

@ardianumam. No, I convert to TRT_model.pb on the TX2, I use swap to get some more memory, as below. It's for CPU memory, but it still helped. https://devtalk.nvidia.com/default/topic/1025939/jetson-tx2/when-i-run-a-tensorflow-model-there-is-not-enough-memory-what-shoud-i-do-/ Maybe this is why I need so much time to load TRT_model ...

fugjo16 avatar Jan 21 '19 05:01 fugjo16

@fugjo16 : I just knew about that. I'll try later in my TX2 too, and update here soon. Thanks. Yes, probably that's the cause.

ardianumam avatar Jan 21 '19 07:01 ardianumam

@ardianumam Thanks! this problem really confuse me a lot.

fugjo16 avatar Jan 22 '19 01:01 fugjo16

Hi @fugjo16 : I just tried in my TX2, and yes, it took about 15 minutes to only read the <tensorrt_model.pb>, meanwhile reading the native tensorflow model <frozen_model>.pb needs only 5 seconds. lol. Maybe it due to the swap memory use when performing TensorRT optimization. I posted to NVIDIA forum too, hope someone replies. Or do you plan to, for example, reduce the YOLOv3 architecture so that we can perform tensorrt optimization in TX2 without making swap memory?

ardianumam avatar Jan 24 '19 03:01 ardianumam

Hi @ardianumam: Thanks a lot! hope someone will answer it. lol. Yes, I think this method will work, I will try it! Thanks :D

fugjo16 avatar Jan 24 '19 09:01 fugjo16

I'd rather say you're hit by the protobuf version/backend. Check: https://devtalk.nvidia.com/default/topic/1046492/tensorrt/extremely-long-time-to-load-trt-optimized-frozen-tf-graphs/

and start with: export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp before running your code. If that doesn't help - update protobuf. I rebuilt it from sources.

filipski avatar Feb 18 '19 09:02 filipski

@filipski : thanks for the info. I'll give it a try.

ardianumam avatar Feb 19 '19 03:02 ardianumam

I tested with this blog's script. it's easy to modify, and it works for me. https://jkjung-avt.github.io/tf-trt-revisited/

fugjo16 avatar Jun 25 '19 05:06 fugjo16

@fugjo16 @ardianumam I have a yolov3 Tensorflow model in both ckpts and .pb format. My model can run in GTX 1080 Ti at 37 FPS . Now I want to run in Xavier NX but model is very slow. about 2 FPS. How I can optimize this model using trt to make it faster and run in Xavier NX? how I can convert .pb model to .trt engine?

MuhammadAsadJaved avatar Sep 18 '20 10:09 MuhammadAsadJaved