TIES-2.0 icon indicating copy to clipboard operation
TIES-2.0 copied to clipboard

Indices do not index into param shape

Open johannes-michael opened this issue 5 years ago • 8 comments

I'm trying to run a training with the data you provided, but get some index problems after some seconds into iteration 0:

python bin/iterate/table_adjacency_parsing.py /home/johannes/devel/projects/tr/configs/gravnet_fast_conv_partial.ini gravnet_fast_conv
(25, 900, 64)
(25, 900, 64)
(25, 900, 64)
(25, 900, 64)
The model has 972848 parameters.
Cleaned summary directory
Cleaned visual feedback output directory
2019-08-01 12:21:21.721217: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
WARNING:tensorflow:From /home/johannes/devel/src/git/python/TIES-2.0/python/iterators/table_adjacency_parsing_iterator.py:67: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
WARNING:tensorflow:`tf.train.start_queue_runners()` was called when no queue runners were defined. You can safely remove the call to this deprecated function.
Starting iterations
Training Iteration 0:
2019-08-01 12:21:41.314411: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at gather_nd_op.cc:50 : Invalid argument: indices[14,899,5] = [14, 899, 900] does not index into param shape [25,900,900]
2019-08-01 12:21:41.317000: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at gather_nd_op.cc:50 : Invalid argument: indices[14,899,5] = [14, 899, 900] does not index into param shape [25,900,900]
2019-08-01 12:21:41.320124: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at gather_nd_op.cc:50 : Invalid argument: indices[9,899,5] = [9, 899, 900] does not index into param shape [25,900,900]
2019-08-01 12:21:43.658247: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at gather_nd_op.cc:50 : Invalid argument: indices[19,899,5] = [19, 900] does not index into param shape [25,900,128]
2019-08-01 12:21:43.668727: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at gather_nd_op.cc:50 : Invalid argument: indices[24,899,5] = [24, 900] does not index into param shape [25,900,128]
2019-08-01 12:21:43.673979: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at gather_nd_op.cc:50 : Invalid argument: indices[4,899,5] = [4, 900] does not index into param shape [25,900,128]
Traceback (most recent call last):
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[14,899,5] = [14, 899, 900] does not index into param shape [25,900,900]
	 [[{{node conv_grav_net_fast_conv/GatherNd_6}} = GatherNd[Tindices=DT_INT32, Tparams=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_conv_grav_net_fast_conv/Placeholder_3_0_3, conv_grav_net_fast_conv/concat_11)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "bin/iterate/table_adjacency_parsing.py", line 31, in <module>
    trainer.train()
  File "/home/johannes/devel/src/git/python/TIES-2.0/python/iterators/table_adjacency_parsing_iterator.py", line 82, in train
    model.run_training_iteration(sess, summary_writer, iteration_number)
  File "/home/johannes/devel/src/git/python/TIES-2.0/python/models/basic_model.py", line 384, in run_training_iteration
    ops_result = sess.run(ops_to_run, feed_dict = feed_dict)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[14,899,5] = [14, 899, 900] does not index into param shape [25,900,900]
	 [[node conv_grav_net_fast_conv/GatherNd_6 (defined at /home/johannes/devel/src/git/python/TIES-2.0/python/models/basic_model.py:201)  = GatherNd[Tindices=DT_INT32, Tparams=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_conv_grav_net_fast_conv/Placeholder_3_0_3, conv_grav_net_fast_conv/concat_11)]]

Caused by op 'conv_grav_net_fast_conv/GatherNd_6', defined at:
  File "bin/iterate/table_adjacency_parsing.py", line 31, in <module>
    trainer.train()
  File "/home/johannes/devel/src/git/python/TIES-2.0/python/iterators/table_adjacency_parsing_iterator.py", line 48, in train
    model.initialize(training=True)
  File "/home/johannes/devel/src/git/python/TIES-2.0/python/models/basic_model.py", line 92, in initialize
    self.build_computation_graphs()
  File "/home/johannes/devel/src/git/python/TIES-2.0/python/models/basic_model.py", line 360, in build_computation_graphs
    self.build_classification_segments(graph_features, placeholders)
  File "/home/johannes/devel/src/git/python/TIES-2.0/python/models/basic_model.py", line 270, in build_classification_segments
    sampled_indices, computation_graph, gt_matrix = self.do_monte_carlo_sampling(graph_features, gt_sampled_adj_matrix)
  File "/home/johannes/devel/src/git/python/TIES-2.0/python/models/basic_model.py", line 201, in do_monte_carlo_sampling
    return samples, x, tf.gather_nd(gt_matrix, indexing_tensor_for_adj_matrices)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3240, in gather_nd
    "GatherNd", params=params, indices=indices, name=name)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): indices[14,899,5] = [14, 899, 900] does not index into param shape [25,900,900]
	 [[node conv_grav_net_fast_conv/GatherNd_6 (defined at /home/johannes/devel/src/git/python/TIES-2.0/python/models/basic_model.py:201)  = GatherNd[Tindices=DT_INT32, Tparams=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_conv_grav_net_fast_conv/Placeholder_3_0_3, conv_grav_net_fast_conv/concat_11)]]

Some indices seem to reference out of bounds. Do you have an idea what could be causing this?

johannes-michael avatar Aug 01 '19 12:08 johannes-michael

Any updates on this issue? :/

hasslercastro avatar Aug 08 '19 15:08 hasslercastro

I was working in Ubuntu, trying to train on CPU but it didn't work, I was getting this problem again and again. I read similar issues in different projects and the solution was to train on GPU, so I did it (using Google Colab). That solved the problem! Hwvr, I'm not getting good results thu :/

hasslercastro avatar Aug 08 '19 16:08 hasslercastro

ahhh could be because of TensorFlow version. Did you generate your own data? What's the size of your training set?

I am sorry I have been on vacations. I'll make time for this repo in the coming days to resolve all the issues.

shahrukhqasim avatar Aug 08 '19 20:08 shahrukhqasim

Yeah, it's very hard to run on CPU it is very compute-intensive anyway.

shahrukhqasim avatar Aug 08 '19 20:08 shahrukhqasim

I was working in Ubuntu, trying to train on CPU but it didn't work, I was getting this problem again and again. I read similar issues in different projects and the solution was to train on GPU, so I did it (using Google Colab). That solved the problem! Hwvr, I'm not getting good results thu :/

Hi @hasslercastro ! Can you please share the config you are using? I tried to run this on GPU, but still, the problem persists. Thanks!

iamrishab avatar Apr 17 '20 09:04 iamrishab

I was working in Ubuntu, trying to train on CPU but it didn't work, I was getting this problem again and again. I read similar issues in different projects and the solution was to train on GPU, so I did it (using Google Colab). That solved the problem! Hwvr, I'm not getting good results thu :/

Hi @hasslercastro ! Can you please share the config you are using? I tried to run this on GPU, but still, the problem persists. Thanks!

Got it running by changing the Tensorflow version mentioned in the repo.

iamrishab avatar Apr 22 '20 17:04 iamrishab

hi @iamrishab, i'm having this same problem. which tensorflow version did you change to? thanks

ghost avatar May 01 '20 13:05 ghost

Hello @hasslercastro, @iamrishab. I am also facing the same issue. I'm running this in CPU machine with tensorflow - 2.2.0. Is there any way to change the code and make it run with this configurations..? Could you please suggest on this.

Thank you in Advance. :)

kmanojkkmr avatar Aug 21 '20 17:08 kmanojkkmr