TFSegmentation icon indicating copy to clipboard operation
TFSegmentation copied to clipboard

4ps on GTX 1080 Ti

Open anusrees opened this issue 6 years ago • 15 comments

screenshot from 2018-11-12 15-57-59

I am checking the segmentation on my Nvidia gtx 1080 ti but am getting a 4 fps. We are running the following command for test - python main.py --load_config=unet_shufflenet_test.yaml test Train UNetShuffleNet. Inference in the place of test doesnt work.

anusrees avatar Nov 12 '18 10:11 anusrees

First you have to use inference mode this will call test_inference method in train/train.py. Second the most efficient architecture we used is SkipNet ShuffleNet not UNet that provides 143 fps on TITANX Pascal with the optimization. If you wanna use the UNet it will mean much more computations as for each downsampling stage you have a corresponding upsampling stage, and it works in the feature space. So transposed conv is applied on features with large nchannels . Third if u want the best performance u also need to run ./optimize.sh after saving the graph.pb.

MSiam avatar Nov 12 '18 14:11 MSiam

Thanks for the suggestion, I will try with SkipNet ShuffleNet. I have a few more queries though. There was an exit(1) on line 897 in train.py inside test_inference function. I commented it out and got the following exception. Hence I timed the feed forward code in the test method. Also, could you tell me the content of optimize.sh? There seems to be no file with that name in the repository.

` 0%| | 0/8 [00:00<?, ?it/s]Traceback (most recent call last): File "/home/software-team/Raghu/TFSegmentation/tfenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call return fn(*args) File "/home/software-team/Raghu/TFSegmentation/tfenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1306, in _run_fn status, run_metadata) File "/home/software-team/anaconda3/lib64/python3.6/contextlib.py", line 88, in exit next(self.gen) File "/home/software-team/Raghu/TFSegmentation/tfenv/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'network/input/Placeholder_2' with dtype bool [[Node: network/input/Placeholder_2 = Placeholderdtype=DT_BOOL, shape=, _device="/job:localhost/replica:0/task:0/gpu:0"]] [[Node: network/input/Placeholder_2/_5709 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_794_network/input/Placeholder_2", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/cpu:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "main.py", line 19, in main() File "main.py", line 15, in main agent.run() File "/home/software-team/Raghu/TFSegmentation/utils/misc.py", line 18, in timed result = f(*args, **kwargs) File "/home/software-team/Raghu/TFSegmentation/agent.py", line 100, in run self.inference() File "/home/software-team/Raghu/TFSegmentation/agent.py", line 165, in inference self.operator.test_inference() File "/home/software-team/Raghu/TFSegmentation/train/train.py", line 935, in test_inference feed_dict=feed_dict) File "/home/software-team/Raghu/TFSegmentation/tfenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run run_metadata_ptr) File "/home/software-team/Raghu/TFSegmentation/tfenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1124, in _run feed_dict_tensor, options, run_metadata) File "/home/software-team/Raghu/TFSegmentation/tfenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run options, run_metadata) File "/home/software-team/Raghu/TFSegmentation/tfenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'network/input/Placeholder_2' with dtype bool [[Node: network/input/Placeholder_2 = Placeholderdtype=DT_BOOL, shape=, _device="/job:localhost/replica:0/task:0/gpu:0"]] [[Node: network/input/Placeholder_2/_5709 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_794_network/input/Placeholder_2", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/cpu:0"]]

Caused by op 'network/input/Placeholder_2', defined at: File "main.py", line 19, in main() File "main.py", line 15, in main agent.run() File "/home/software-team/Raghu/TFSegmentation/utils/misc.py", line 18, in timed result = f(*args, **kwargs) File "/home/software-team/Raghu/TFSegmentation/agent.py", line 85, in run self.build_model() File "/home/software-team/Raghu/TFSegmentation/utils/misc.py", line 18, in timed result = f(*args, **kwargs) File "/home/software-team/Raghu/TFSegmentation/agent.py", line 64, in build_model self.model.build() File "/home/software-team/Raghu/TFSegmentation/models/unet_shufflenet.py", line 16, in build self.init_input() File "/home/software-team/Raghu/TFSegmentation/models/basic/basic_model.py", line 116, in init_input self.is_training = tf.placeholder(tf.bool) File "/home/software-team/Raghu/TFSegmentation/tfenv/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1548, in placeholder return gen_array_ops._placeholder(dtype=dtype, shape=shape, name=name) File "/home/software-team/Raghu/TFSegmentation/tfenv/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2094, in _placeholder name=name) File "/home/software-team/Raghu/TFSegmentation/tfenv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/home/software-team/Raghu/TFSegmentation/tfenv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/software-team/Raghu/TFSegmentation/tfenv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1204, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'network/input/Placeholder_2' with dtype bool [[Node: network/input/Placeholder_2 = Placeholderdtype=DT_BOOL, shape=, _device="/job:localhost/replica:0/task:0/gpu:0"]] [[Node: network/input/Placeholder_2/_5709 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_794_network/input/Placeholder_2", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/cpu:0"]]

`

anusrees avatar Nov 13 '18 07:11 anusrees

new I tried with skipnet shufflenet but I am still getting 5fps of speed. I used ran the following code: python main.py --load_config=fcn8s_shufflenet_test.yaml test Train FCN8sShuffleNet

anusrees avatar Nov 14 '18 09:11 anusrees

In order to run the inference mode. Just uncomment line 919, which is setting is_training placeholder to false. It was commented because I removed this variable when I was doing optimization and didnt want it in graph.pb. As for optimize.sh its in branch optimize_inference.

MSiam avatar Nov 14 '18 15:11 MSiam

I tried as you suggested and now it has given me a significant improvement of 25 fps. However, the sh optimize.sh code is not improving the performance any further, I am getting around 25 fps after running that.

I running the following sequence of codes: python main.py --load_config=fcn8s_shufflenet_test.yaml inference Train FCN8sShuffleNet sh optimize.sh python main.py --load_config=fcn8s_shufflenet_test.yaml inference Train FCN8sShuffleNet

Could you please tell me where I might me going wrong?

anusrees avatar Nov 15 '18 06:11 anusrees

Thats still very slow on 1080 Ti! Probably you're using a higher image resolution than what we were measuring with. Regardless please refer to this issue for further help on what to do: Issue20

MSiam avatar Nov 15 '18 15:11 MSiam

I am getting 67 FPS with images of size 704x576. Thanks for your help.

anusrees avatar Nov 16 '18 09:11 anusrees

@anusrees @MSiam Hello I download the optimize.sh and infer_optimize.py in branch optimize_inference and paste them to branch master. And when I run the following codes: python main.py --load_config=fcn8s_shufflenet_test.yaml inference Train FCN8sShuffleNet sh optimize.sh python infer_optimize.py --graph graph_optimized.pb

The first two steps is OK and got graph.pb and graph_optimized.pb. But when I run the third code. I got a InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 691200 values, but the requested shape has 1. I did not change the code anywhere.

Please could you tell me how do you get the FPS and if you have this problem?

MMY1994 avatar Nov 25 '18 09:11 MMY1994

@MMY1994 I added this line in infer_optimize.py: is_training = G.get_tensor_by_name('import/network/input/Placeholder_2:0'), below: x = G.get_tensor_by_name('import/network/input/Placeholder:0') and added is_training: False after x: img in sess.run(...).

anusrees avatar Nov 26 '18 13:11 anusrees

@anusrees @MSiam I got the FPS. Thank you so much! ! ! !

But why I get the 37FPS with images of size 360x640 in branch optimize_inference, and I get the 34FPS with the same size picture in branch master? There is no big difference between them. I test it on NVIDIA 1050 Ti.

In branch master it is too fast, I just run this code: python main.py --load_config=fcn8s_shufflenet_test.yaml inference Train FCN8sShuffleNet

MMY1994 avatar Nov 26 '18 13:11 MMY1994

@MMY1994 @MSiam Hi, Even I am getting a difference of 8 fps between optimize_inference branch and the master branch with the optimize_inference branch being faster, not sure why. My initial image size is 704x576.

anusrees avatar Nov 29 '18 09:11 anusrees

@anusrees I got it! Thank you for your reply.

MMY1994 avatar Nov 29 '18 14:11 MMY1994

@MMY1994 Have you been able to figure out why there is a difference?

anusrees avatar Nov 29 '18 16:11 anusrees

@anusrees as a expose in another issue I'm having troubles with running python3 main.py --load_config=fcn8s_shufflenet_test.yaml inference Train FCN8sShuffleNet because some .npy do not exist (X_test.npy, etc.). But, if I modify the name of some others .npy so that I can fake the data I need and overcome that errors, a Keyerror explained in the issue #61

What I discovered is that if I modify the train.py code like this from line 906 downward:

        x_batch = self.test_data['X'][idx:idx + 1]
        # y_batch = self.test_data['Y'][idx:idx + 1]

        # update idx of mini_batch
        idx += 1

        # Feed this variables to the network
        if self.args.random_cropping:
            feed_dict = {self.test_model.x_pl_before: x_batch
                      #   self.test_model.y_pl_before: y_batch
                         #self.test_model.is_training: False
                         }
        else:
            feed_dict = {self.test_model.x_pl: x_batch
                      #   self.test_model.y_pl: y_batch
                         #self.test_model.is_training: False
                         }

        # calculate the time of one inference
        start = time.time()

        # run the feed_forward
        _ = self.sess.run(
            [self.test_model.out_argmax],
            feed_dict=feed_dict)

I get a very similar error as you expose:

0%| | 0/2975 [00:00<?, ?it/s]Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1327, in _do_call return fn(*args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1306, in _run_fn status, run_metadata) File "/usr/lib/python3.5/contextlib.py", line 66, in exit next(self.gen) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'network/input/Placeholder_2' with dtype bool [[Node: network/input/Placeholder_2 = Placeholderdtype=DT_BOOL, shape=, _device="/job:localhost/replica:0/task:0/cpu:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "main.py", line 19, in main() File "main.py", line 15, in main agent.run() File "/home/alejandro/Segmentation/TFSegmentation/utils/misc.py", line 18, in timed result = f(*args, **kwargs) File "/home/alejandro/Segmentation/TFSegmentation/agent.py", line 100, in run self.inference() File "/home/alejandro/Segmentation/TFSegmentation/agent.py", line 165, in inference self.operator.test_inference() File "/home/alejandro/Segmentation/TFSegmentation/train/train.py", line 930, in test_inference feed_dict=feed_dict) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 895, in run run_metadata_ptr) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1124, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run options, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'network/input/Placeholder_2' with dtype bool [[Node: network/input/Placeholder_2 = Placeholderdtype=DT_BOOL, shape=, _device="/job:localhost/replica:0/task:0/cpu:0"]]

Caused by op 'network/input/Placeholder_2', defined at: File "main.py", line 19, in main() File "main.py", line 15, in main agent.run() File "/home/alejandro/Segmentation/TFSegmentation/utils/misc.py", line 18, in timed result = f(*args, **kwargs) File "/home/alejandro/Segmentation/TFSegmentation/agent.py", line 85, in run self.build_model() File "/home/alejandro/Segmentation/TFSegmentation/utils/misc.py", line 18, in timed result = f(*args, **kwargs) File "/home/alejandro/Segmentation/TFSegmentation/agent.py", line 64, in build_model self.model.build() File "/home/alejandro/Segmentation/TFSegmentation/models/fcn8s_shufflenet.py", line 22, in build self.init_input() File "/home/alejandro/Segmentation/TFSegmentation/models/basic/basic_model.py", line 116, in init_input self.is_training = tf.placeholder(tf.bool) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_ops.py", line 1548, in placeholder return gen_array_ops._placeholder(dtype=dtype, shape=shape, name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 2094, in _placeholder name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1204, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'network/input/Placeholder_2' with dtype bool [[Node: network/input/Placeholder_2 = Placeholderdtype=DT_BOOL, shape=, _device="/job:localhost/replica:0/task:0/cpu:0"]]

Could you please tell me how you resolve this problem so that the program could run completely? I assume you solve it because in your next comment you show the output of the program. If you have any clue about why I'm having the previous errors that force me to do modifications to the code it would be also very helpful. If @MSiam has any idea to solve the problems it would be also great.

Thank you in advance

Zitzo avatar Dec 18 '18 18:12 Zitzo

@Zitzo this is how my code looks like: `x_batch = self.test_data['X'][idx:idx + 1] y_batch = self.test_data['Y'][idx:idx + 1]

        # update idx of mini_batch
        idx += 1

        # Feed this variables to the network
        if self.args.random_cropping:
            feed_dict = {self.test_model.x_pl_before: x_batch,
                         self.test_model.y_pl_before: y_batch,
                         self.test_model.is_training: False
                         }
        else:
            feed_dict = {self.test_model.x_pl: x_batch,
                         self.test_model.y_pl: y_batch,
                         self.test_model.is_training: False
                         }

        # calculate the time of one inference
        start = time.time()

        # run the feed_forward
        _ = self.sess.run(
            [self.test_model.out_argmax],
            feed_dict=feed_dict)

`

anusrees avatar Jan 03 '19 11:01 anusrees