tensorrt icon indicating copy to clipboard operation
tensorrt copied to clipboard

how to deal with the whileloop or tensorarray in TF model

Open rubyway opened this issue 6 years ago • 10 comments

hi, SSD postproessor combined with while loop and tensorarray, how tensorRT deal with control flow op? enter merge switch etc. how to allocate memory for tensorarray object.

These op confused me for a long time. Truthly thanks!

rubyway avatar Jun 18 '19 12:06 rubyway

TF-TRT currently doesn't support loops, control ops, or TensorArray. TensorRT also doesn't support them at the moment.

We have observed that a much more efficient way of doing NMS in postprocessing is using an op like CombinedNonMaxSuppression which fuses a lot of ops into a single op, and TF-TRT can also convert that to TensorRT. We get significant speedup on SSD with this.

pooyadavoodi avatar Jul 19 '19 21:07 pooyadavoodi

@pooyadavoodi I have a custom model that uses a while_loop to iterate through a fix number of possibilities. I'm encountering the same issue with tensorRT not supporting 'while_loop'. Would 'tensorRT' support a for-loop implemented with TensorFlow 2.0? I'm currently testing the while_loop in TF 1.14.

Thanks!

jtressle avatar Aug 05 '19 23:08 jtressle

Supporting while_loop in TF-TRT is not planned yet. TensorRT itself also doesn't support loops or control ops yet.

Currently, if you optimize your model with TF-TRT, the while_loop would stay in TF, and rest of your graph gets potentially optimized by TF-TRT. So you may already get a good speedup with that.

pooyadavoodi avatar Aug 29 '19 20:08 pooyadavoodi

@pooyadavoodi thanks for the tip. I reran my model through TF-TRT with while loops, and conversion was about 1 minute. When I ran TF-TRT on the unrolled model (fixed range of 64), the TF-TRT process took 3 hours, and didn't complete.

Is this normal? Or is there probably an issue with my cell reuse flags in the model that could be causing issues? It's a recurrent model so there is added complexity there.

Would running the unrolled model through UFF be a worthwhile exercise?

Thanks in advance.

jtressle avatar Aug 30 '19 17:08 jtressle

@pooyadavoodi I went ahead and ran inference using the TR-TRT model in the while loop. The model is slower than my original (unoptimized) saved model by 20%.

I do get an error when creating the TRT model.

My while-loop implementation is basically:

cond = lambda index, *_: tf.less(index, FLAGS.steps)
_, state, max_output = tf.while_loop(
	cond, body
	, [index, state, max_output]
	, back_prop=False, parallel_iterations=1)

The error is: layout failed: Invalid argument: MutableGraphView::SortTopologically error: detected edge(s) creating cycle(s)

and this error when running the model:

MutableGraphView::SortTopologically error: detected edge(s) creating cycle(s) {'while/Switch_5' -> 'while/Less_1', 'while/mul_2' -> 'while/add_44', 'while/Switch_4' -> 'while/mul_4',...

I'm running the nightly docker image with gpu and python3 support. Unfortunately, the latest docker image doesn't run TF-TRT.

Any ideas?

Thanks

jtressle avatar Aug 30 '19 18:08 jtressle

@pooyadavoodi Hi, I wanted to give you an update and get some thoughts.

For reference, I now using nvidia's 19.08-py3 tensorflow build.

I went ahead and tried TF-TRT on the while loop. Looks like TF-TRT TensorRTOptimizer is having an issue with the while loop. The nightly TF image had no issue with the while loop.

Since my loop has recurrent components, could these be causing issues with TF-TRT?

tensorflow/core/grappler/optimizers/meta_optimizer.cc:752] Optimization results for grappler item: tf_graph
tensorflow/core/grappler/optimizers/meta_optimizer.cc:754]   constant folding: Graph size after: 1112 nodes (-131), 1569 edges (-159), time = 72.914ms.
tensorflow/core/grappler/optimizers/meta_optimizer.cc:754]   layout: Graph size after: 1614 nodes (502), 2104 edges (535), time = 36.082ms.
tensorflow/core/grappler/optimizers/meta_optimizer.cc:754]   constant folding: Graph size after: 1164 nodes (-450), 1655 edges (-449), time = 28.069ms.
tensorflow/core/grappler/optimizers/meta_optimizer.cc:754]   TensorRTOptimizer: Invalid argument: The graph couldn't be sorted in topological order.

jtressle avatar Sep 01 '19 00:09 jtressle

Hi @jtressle, have you been able to finish the TFT-TR process?

When I ran TF-TRT on the unrolled model (fixed range of 64), the TF-TRT process took 3 hours, and didn't complete.

Because I'm having a similar problem, the optimization never ends and this is the last messsage I get

2019-09-12 07:33:46.223752: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created

TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10373 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:0a:00.0, compute capability: 6.1) 2019-09-12 07:33:48.347858: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:460] There are 13 ops of 5 different types in the graph that are not converted to TensorRT: Identity, ResizeBilinear, ResizeNearestNeighbor, NoOp, Placeholder, (For more information see https://docs.nvidia.com/deeplearning/dgx/tf-trt-user-guide/index.html#supported-ops). 2019-09-12 07:33:48.980015: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:733] Number of TensorRT candidate segments: 7 2019-09-12 07:33:50.247807: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1 2019-09-12 07:33:50.852589: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2019-09-12 07:33:51.909484: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10

I am also using a tf.while loop, but I'm having the same problem even removing it. It seems you have been able to run the conversion in a reasonable time. Any advice?

mmeendez8 avatar Sep 12 '19 07:09 mmeendez8

@mmeendez8 yes, I was able to get faster conversion using saved_model_cli to convert to a TF-TRT model. I believe it uses the older API to convert to TF-TRT. Can you confirm @pooyadavoodi ?

Another option you can try is to unroll the while loop (simple for i in range(n): call). For my case, the while loop was used to run 256 iterations. Could this work for you?

jtressle avatar Sep 14 '19 15:09 jtressle

+@aaroey I'm not sure what version of TF-TRT the saved_model_cli would use. I suppose the latest from master.

Looks like there are two problems we need to look at:

  • while_loop breaks the topological sort.
  • Unrolled loop for 64 steps increases the conversion time to 3 hours.

@jtressle could you post a repro example (both TF-TRT API call, and the model).

pooyadavoodi avatar Sep 16 '19 16:09 pooyadavoodi

Supporting while_loop in TF-TRT is not planned yet. TensorRT itself also doesn't support loops or control ops yet.

Currently, if you optimize your model with TF-TRT, the while_loop would stay in TF, and rest of your graph gets potentially optimized by TF-TRT. So you may already get a good speedup with that.

Hi @pooyadavoodi we're having the same issue with TensorRT not supporting tf.while_loop. Is while_loop supported in TensorRT yet?

Some-random avatar Mar 30 '21 07:03 Some-random