personlab-tf
personlab-tf copied to clipboard
Failed to run training step
I downloaded the repo, went to all the steps of the setup notebook and while executing the example notebook I get an issue running the training: INFO:tensorflow:Error reported to Coordinator: indices[3,4,50,50,16] = [4, 51, 50, 16] does not index into param shape [5,51,51,17]
I use an AWS p2.xlarge and a p3.8xlarge to test and both gives the same error. I used the AWS Deep Learning AMI with Ubuntu.
Full output:
loading annotations into memory...
Done (t=15.25s)
creating index...
index created!
loading annotations into memory...
Done (t=7.31s)
creating index...
index created!
WARNING:tensorflow:From /home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py:737: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
INFO:tensorflow:Restoring parameters from logs/sample/model.ckpt-0
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path logs/sample/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Error reported to Coordinator: indices[3,4,50,50,16] = [4, 51, 50, 16] does not index into param shape [5,51,51,17]
[[node GatherNd_5 (defined at /home/ubuntu/personlab-tf/personlab/util.py:33) = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_7/BiasAdd, stack_48)]]
Caused by op 'GatherNd_5', defined at:
File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 499, in start
self.io_loop.start()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
self.asyncio_loop.run_forever()
File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/base_events.py", line 422, in run_forever
self._run_once()
File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/base_events.py", line 1432, in _run_once
handle._run()
File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/events.py", line 145, in _run
self._callback(*self._args)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 122, in _handle_events
handler_func(fileobj, events)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
self._handle_recv()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
self._run_callback(callback, msg)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
callback(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
handler(stream, idents, msg)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2662, in run_cell
raw_cell, store_history, silent, shell_futures)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2785, in _run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2907, in run_ast_nodes
if self.run_code(code, result):
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2961, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-05888e252ab7>", line 8, in <module>
train(mobilenet_v2_model, gen.loader, pm_check_path, log_dir)
File "/home/ubuntu/personlab-tf/personlab/model.py", line 25, in train
output, init_func = model_func(tensors['image'], checkpoint_path=checkpoint_path, is_training=True)
File "/home/ubuntu/personlab-tf/personlab/models/mobilenet_v2.py", line 16, in mobilenet_v2_model
res = model_base(model_output, inner_h, inner_w)
File "/home/ubuntu/personlab-tf/personlab/models/model_base.py", line 36, in model_base
lo_y = gather_bilinear(lo_y, lo_p, (inner_h, inner_w)) + lo_y
File "/home/ubuntu/personlab-tf/personlab/util.py", line 33, in gather_bilinear
r = tf.gather_nd(params, idx)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3240, in gather_nd
"GatherNd", params=params, indices=indices, name=name)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): indices[3,4,50,50,16] = [4, 51, 50, 16] does not index into param shape [5,51,51,17]
[[node GatherNd_5 (defined at /home/ubuntu/personlab-tf/personlab/util.py:33) = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_7/BiasAdd, stack_48)]]
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[3,4,50,50,16] = [4, 51, 50, 16] does not index into param shape [5,51,51,17]
[[{{node GatherNd_5}} = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_7/BiasAdd, stack_48)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
yield
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 495, in run
self.run_loop()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 1034, in run_loop
self._sv.global_step])
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[3,4,50,50,16] = [4, 51, 50, 16] does not index into param shape [5,51,51,17]
[[node GatherNd_5 (defined at /home/ubuntu/personlab-tf/personlab/util.py:33) = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_7/BiasAdd, stack_48)]]
Caused by op 'GatherNd_5', defined at:
File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 499, in start
self.io_loop.start()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
self.asyncio_loop.run_forever()
File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/base_events.py", line 422, in run_forever
self._run_once()
File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/base_events.py", line 1432, in _run_once
handle._run()
File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/events.py", line 145, in _run
self._callback(*self._args)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 122, in _handle_events
handler_func(fileobj, events)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
self._handle_recv()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
self._run_callback(callback, msg)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
callback(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
handler(stream, idents, msg)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2662, in run_cell
raw_cell, store_history, silent, shell_futures)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2785, in _run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2907, in run_ast_nodes
if self.run_code(code, result):
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2961, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-05888e252ab7>", line 8, in <module>
train(mobilenet_v2_model, gen.loader, pm_check_path, log_dir)
File "/home/ubuntu/personlab-tf/personlab/model.py", line 25, in train
output, init_func = model_func(tensors['image'], checkpoint_path=checkpoint_path, is_training=True)
File "/home/ubuntu/personlab-tf/personlab/models/mobilenet_v2.py", line 16, in mobilenet_v2_model
res = model_base(model_output, inner_h, inner_w)
File "/home/ubuntu/personlab-tf/personlab/models/model_base.py", line 36, in model_base
lo_y = gather_bilinear(lo_y, lo_p, (inner_h, inner_w)) + lo_y
File "/home/ubuntu/personlab-tf/personlab/util.py", line 33, in gather_bilinear
r = tf.gather_nd(params, idx)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3240, in gather_nd
"GatherNd", params=params, indices=indices, name=name)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): indices[3,4,50,50,16] = [4, 51, 50, 16] does not index into param shape [5,51,51,17]
[[node GatherNd_5 (defined at /home/ubuntu/personlab-tf/personlab/util.py:33) = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_7/BiasAdd, stack_48)]]
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1333 try:
-> 1334 return fn(*args)
1335 except errors.OpError as e:
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
1318 return self._call_tf_sessionrun(
-> 1319 options, feed_dict, fetch_list, target_list, run_metadata)
1320
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
1406 self._session, options, feed_dict, fetch_list, target_list,
-> 1407 run_metadata)
1408
InvalidArgumentError: indices[1,4,50,50,36] = [4, 49, 51, 5] does not index into param shape [5,51,51,17]
[[{{node GatherNd_1}} = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_3/BiasAdd, stack_8)]]
During handling of the above exception, another exception occurred:
InvalidArgumentError Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py in managed_session(self, master, config, start_standard_services, close_summary_writer)
993 start_standard_services=start_standard_services)
--> 994 yield sess
995 except Exception as e:
~/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py in train(train_op, logdir, train_step_fn, train_step_kwargs, log_every_n_steps, graph, master, is_chief, global_step, number_of_steps, init_op, init_feed_dict, local_init_op, init_fn, ready_op, summary_op, save_summaries_secs, summary_writer, startup_delay_steps, saver, save_interval_secs, sync_optimizer, session_config, session_wrapper, trace_every_n_steps, ignore_live_threads)
769 total_loss, should_stop = train_step_fn(
--> 770 sess, train_op, global_step, train_step_kwargs)
771 if should_stop:
~/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py in train_step(sess, train_op, global_step, train_step_kwargs)
486 options=trace_run_options,
--> 487 run_metadata=run_metadata)
488 time_elapsed = time.time() - start_time
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
928 result = self._run(None, fetches, feed_dict, options_ptr,
--> 929 run_metadata_ptr)
930 if run_metadata:
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1151 results = self._do_run(handle, final_targets, final_fetches,
-> 1152 feed_dict_tensor, options, run_metadata)
1153 else:
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1327 return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1328 run_metadata)
1329 else:
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1347 message = error_interpolation.interpolate(message, self._graph)
-> 1348 raise type(e)(node_def, op, message)
1349
InvalidArgumentError: indices[1,4,50,50,36] = [4, 49, 51, 5] does not index into param shape [5,51,51,17]
[[node GatherNd_1 (defined at /home/ubuntu/personlab-tf/personlab/util.py:33) = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_3/BiasAdd, stack_8)]]
Caused by op 'GatherNd_1', defined at:
File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 499, in start
self.io_loop.start()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
self.asyncio_loop.run_forever()
File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/base_events.py", line 422, in run_forever
self._run_once()
File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/base_events.py", line 1432, in _run_once
handle._run()
File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/events.py", line 145, in _run
self._callback(*self._args)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 122, in _handle_events
handler_func(fileobj, events)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
self._handle_recv()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
self._run_callback(callback, msg)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
callback(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
handler(stream, idents, msg)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2662, in run_cell
raw_cell, store_history, silent, shell_futures)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2785, in _run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2907, in run_ast_nodes
if self.run_code(code, result):
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2961, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-05888e252ab7>", line 8, in <module>
train(mobilenet_v2_model, gen.loader, pm_check_path, log_dir)
File "/home/ubuntu/personlab-tf/personlab/model.py", line 25, in train
output, init_func = model_func(tensors['image'], checkpoint_path=checkpoint_path, is_training=True)
File "/home/ubuntu/personlab-tf/personlab/models/mobilenet_v2.py", line 16, in mobilenet_v2_model
res = model_base(model_output, inner_h, inner_w)
File "/home/ubuntu/personlab-tf/personlab/models/model_base.py", line 30, in model_base
mo_y = gather_bilinear(so_y, mo_p, (inner_h, inner_w)) + mo_y
File "/home/ubuntu/personlab-tf/personlab/util.py", line 33, in gather_bilinear
r = tf.gather_nd(params, idx)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3240, in gather_nd
"GatherNd", params=params, indices=indices, name=name)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): indices[1,4,50,50,36] = [4, 49, 51, 5] does not index into param shape [5,51,51,17]
[[node GatherNd_1 (defined at /home/ubuntu/personlab-tf/personlab/util.py:33) = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_3/BiasAdd, stack_8)]]
During handling of the above exception, another exception occurred:
InvalidArgumentError Traceback (most recent call last)
<ipython-input-2-05888e252ab7> in <module>()
6 log_dir = 'logs/sample/'
7
----> 8 train(mobilenet_v2_model, gen.loader, pm_check_path, log_dir)
~/personlab-tf/personlab/model.py in train(model_func, data_generator, checkpoint_path, log_dir)
77 log_every_n_steps=100,
78 save_summaries_secs=300,
---> 79 session_config=sess_config,
80 )
81
~/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py in train(train_op, logdir, train_step_fn, train_step_kwargs, log_every_n_steps, graph, master, is_chief, global_step, number_of_steps, init_op, init_feed_dict, local_init_op, init_fn, ready_op, summary_op, save_summaries_secs, summary_writer, startup_delay_steps, saver, save_interval_secs, sync_optimizer, session_config, session_wrapper, trace_every_n_steps, ignore_live_threads)
783 threads,
784 close_summary_writer=True,
--> 785 ignore_live_threads=ignore_live_threads)
786
787 except errors.AbortedError:
~/anaconda3/lib/python3.6/contextlib.py in __exit__(self, type, value, traceback)
97 value = type()
98 try:
---> 99 self.gen.throw(type, value, traceback)
100 except StopIteration as exc:
101 # Suppress StopIteration *unless* it's the same exception that
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py in managed_session(self, master, config, start_standard_services, close_summary_writer)
1002 # threads which are not checking for `should_stop()`. They
1003 # will be stopped when we close the session further down.
-> 1004 self.stop(close_summary_writer=close_summary_writer)
1005 finally:
1006 # Close the session to finish up all pending calls. We do not care
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py in stop(self, threads, close_summary_writer, ignore_live_threads)
830 threads,
831 stop_grace_period_secs=self._stop_grace_secs,
--> 832 ignore_live_threads=ignore_live_threads)
833 finally:
834 # Close the writer last, in case one of the running threads was using it.
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py in join(self, threads, stop_grace_period_secs, ignore_live_threads)
387 self._registered_threads = set()
388 if self._exc_info_to_raise:
--> 389 six.reraise(*self._exc_info_to_raise)
390 elif stragglers:
391 if ignore_live_threads:
~/anaconda3/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
691 if value.__traceback__ is not tb:
692 raise value.with_traceback(tb)
--> 693 raise value
694 finally:
695 value = None
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py in stop_on_exception(self)
295 """
296 try:
--> 297 yield
298 except: # pylint: disable=bare-except
299 self.request_stop(ex=sys.exc_info())
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py in run(self)
493 while not self._coord.wait_for_stop(next_timer_time - time.time()):
494 next_timer_time += self._timer_interval_secs
--> 495 self.run_loop()
496 self.stop_loop()
497
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py in run_loop(self)
1032 if self._sv.global_step is not None:
1033 summary_strs, global_step = self._sess.run([self._sv.summary_op,
-> 1034 self._sv.global_step])
1035 else:
1036 summary_strs = self._sess.run(self._sv.summary_op)
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
927 try:
928 result = self._run(None, fetches, feed_dict, options_ptr,
--> 929 run_metadata_ptr)
930 if run_metadata:
931 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1150 if final_fetches or final_targets or (handle and feed_dict_tensor):
1151 results = self._do_run(handle, final_targets, final_fetches,
-> 1152 feed_dict_tensor, options, run_metadata)
1153 else:
1154 results = []
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1326 if handle is None:
1327 return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1328 run_metadata)
1329 else:
1330 return self._do_call(_prun_fn, handle, feeds, fetches)
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1346 pass
1347 message = error_interpolation.interpolate(message, self._graph)
-> 1348 raise type(e)(node_def, op, message)
1349
1350 def _extend_graph(self):
InvalidArgumentError: indices[3,4,50,50,16] = [4, 51, 50, 16] does not index into param shape [5,51,51,17]
[[node GatherNd_5 (defined at /home/ubuntu/personlab-tf/personlab/util.py:33) = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_7/BiasAdd, stack_48)]]
Caused by op 'GatherNd_5', defined at:
File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 499, in start
self.io_loop.start()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
self.asyncio_loop.run_forever()
File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/base_events.py", line 422, in run_forever
self._run_once()
File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/base_events.py", line 1432, in _run_once
handle._run()
File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/events.py", line 145, in _run
self._callback(*self._args)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 122, in _handle_events
handler_func(fileobj, events)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
self._handle_recv()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
self._run_callback(callback, msg)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
callback(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
handler(stream, idents, msg)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2662, in run_cell
raw_cell, store_history, silent, shell_futures)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2785, in _run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2907, in run_ast_nodes
if self.run_code(code, result):
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2961, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-05888e252ab7>", line 8, in <module>
train(mobilenet_v2_model, gen.loader, pm_check_path, log_dir)
File "/home/ubuntu/personlab-tf/personlab/model.py", line 25, in train
output, init_func = model_func(tensors['image'], checkpoint_path=checkpoint_path, is_training=True)
File "/home/ubuntu/personlab-tf/personlab/models/mobilenet_v2.py", line 16, in mobilenet_v2_model
res = model_base(model_output, inner_h, inner_w)
File "/home/ubuntu/personlab-tf/personlab/models/model_base.py", line 36, in model_base
lo_y = gather_bilinear(lo_y, lo_p, (inner_h, inner_w)) + lo_y
File "/home/ubuntu/personlab-tf/personlab/util.py", line 33, in gather_bilinear
r = tf.gather_nd(params, idx)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3240, in gather_nd
"GatherNd", params=params, indices=indices, name=name)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): indices[3,4,50,50,16] = [4, 51, 50, 16] does not index into param shape [5,51,51,17]
[[node GatherNd_5 (defined at /home/ubuntu/personlab-tf/personlab/util.py:33) = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_7/BiasAdd, stack_48)]]
I have this error as well.
`INFO:tensorflow:Error reported to Coordinator: flat indices[1728695, :] = [2, 25, 51, 6] does not index into param (shape: [5,51,51,17]). [[Node: GatherNd_1 = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_3/BiasAdd, stack_8)]]
Caused by op 'GatherNd_1', defined at:
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in
InvalidArgumentError (see above for traceback): flat indices[1728695, :] = [2, 25, 51, 6] does not index into param (shape: [5,51,51,17]). [[Node: GatherNd_1 = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_3/BiasAdd, stack_8)]]`
@jrbasso @alexryan sorry to reply you too late. it seems there is a bug that making the offset vectors to be out of boundary. it occurs only in CPU environment, and ignored in GPU enviroment. (https://github.com/tensorflow/tensorflow/issues/15091) I'll try to fix it in as soon as possible. if you find how to fix it, please send pull request.
@sydsim I actually ran into this issue using a GPU environment. Do you have any ideal on what can I try to mitigate that? Or any clue that I can research and try to fix it? Thanks.