convnet-benchmarks
convnet-benchmarks copied to clipboard
Tensorflow benchmarks cause error when running run_forward_backward
Running tensorflow 0.12.head (GPU supported)
No issues when running run_foward.
This is the output i got when running benchmark_alexnet.py:
2017-04-04 12:28:17.663471: step 10, duration = 0.099 2017-04-04 12:28:18.638741: step 20, duration = 0.097 2017-04-04 12:28:19.610754: step 30, duration = 0.097 2017-04-04 12:28:20.584540: step 40, duration = 0.098 2017-04-04 12:28:21.557400: step 50, duration = 0.097 2017-04-04 12:28:22.528207: step 60, duration = 0.096 2017-04-04 12:28:23.504192: step 70, duration = 0.097 2017-04-04 12:28:24.476781: step 80, duration = 0.097 2017-04-04 12:28:25.449244: step 90, duration = 0.097 2017-04-04 12:28:26.325418: Forward across 100 steps, 0.096 +/- 0.010 sec / batch WARNING:tensorflow:From /home/atinzad/code/benchmark_alexnet.py:102 in loss.: concat (from tensorflow.python.ops.array_ops) is deprecated and will be removed after 2016-12-13. Instructions for updating: This op will be removed after the deprecation date. Please switch to tf.concat_v2(). Traceback (most recent call last):
File "
File "/usr/local/lib/python2.7/dist-packages/spyder/utils/site/sitecustomize.py", line 866, in runfile execfile(filename, namespace)
File "/usr/local/lib/python2.7/dist-packages/spyder/utils/site/sitecustomize.py", line 94, in execfile builtins.execfile(filename, *where)
File "/home/atinzad/code/benchmark_alexnet.py", line 221, in
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/atinzad/code/benchmark_alexnet.py", line 217, in main run_benchmark()
File "/home/atinzad/code/benchmark_alexnet.py", line 206, in run_benchmark objective = loss(last_layer, labels)
File "/home/atinzad/code/benchmark_alexnet.py", line 102, in loss concated = tf.concat([indices, labels], 1)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 111, in new_func return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1118, in concat return concat_v2(values, concat_dim, name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1053, in concat_v2 ).assert_is_compatible_with(tensor_shape.scalar())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_shape.py", line 756, in assert_is_compatible_with raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (2, 128, 1) and () are incompatible
This might not help, but worth a try. I've seen this in a bunch of the models from the tensorflow/models repo after upgrading to tensorflow-v1.0, but it looks like you're using 0.12 anyways. For me, I was seeing this Shapes(n,m,1,1) and (1,1) are incompatible. It seemed to me to be related changes to argument ordering made in tensorflow-v1.0 to some of the routines, including loss and one of the softmax routines. So I'm immediately suspicious of that tf.concat() call needing the arguments swapped or something added/removed. I had to rearrange arguments for a few routines if I recall.
Where did you get your benchmark_alexnet.py? If was built with a tf-v1.0 instead of 0.12, this might explain it. I believe the tensorflow guys have a script for making other python scripts v1.0 compatible. You could try upgrading to tensorflow-v1 and see if this fixes the problem.
In benchmark_alexnet.py, try changing the order of the loss arguments, or use slim (I've had luck with this approach):
import tensorflow.contrib.slim as slim
See available functions here: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim
Good luck