deep-learning-models
deep-learning-models copied to clipboard
Throw InvalidArgumentError when use inception_v3 in distributed tensorflow
if I delete model.load_weights() or just use 1 worker 0 ps, it works. the following is exception stack:
File "/home/work/tfonspark/rudder/apps/image_classify_inception_v3.py", line 78, in get_model input_shape=(Width, Height, 3)) File "./rudder.zip/rudder/models/inception_v3.py", line 341, in InceptionV3 branch_pool = conv2d_bn(branch_pool, 192, 1, 1) File "./rudder.zip/rudder/models/inception_v3.py", line 84, in conv2d_bn x = BatchNormalization(axis=bn_axis, scale=False, name=bn_name)(x) File "/usr/local/python27/lib/python2.7/site-packages/keras/engine/topology.py", line 569, in call self.build(input_shapes[0]) File "/usr/local/python27/lib/python2.7/site-packages/keras/layers/normalization.py", line 123, in build trainable=False) File "/usr/local/python27/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper return func(*args, **kwargs) File "/usr/local/python27/lib/python2.7/site-packages/keras/engine/topology.py", line 391, in add_weight weight = K.variable(initializer(shape), dtype=dtype, name=name) File "/usr/local/python27/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 321, in variable v = tf.Variable(value, dtype=_convert_string_dtype(dtype), name=name) File "/usr/local/python27/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 200, in init expected_shape=expected_shape) File "/usr/local/python27/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 297, in _init_from_args name=name) File "/usr/local/python27/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 128, in variable_op_v2 shared_name=shared_name) File "/usr/local/python27/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 684, in _variable_v2 name=name) File "/usr/local/python27/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/usr/local/python27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/python27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in init self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'batch_normalization_94/moving_variance': Operation was explicitly assigned to /job:ps/task:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0, /job:localhost/replica:0/task:0/gpu:0 ]. Make sure the device specification refers to a valid device. [[Node: batch_normalization_94/moving_variance = VariableV2container="", dtype=DT_FLOAT, shape=[192], shared_name="", _device="/job:ps/task:0"]]