diagnose-and-explain icon indicating copy to clipboard operation
diagnose-and-explain copied to clipboard

I am running it on GPU not using TPU, getting error in estimator.train(input_fn=input_fn, max_steps=1000)

Open ShivamPanchal opened this issue 5 years ago • 1 comments

WARNING:tensorflow:From :32: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version. Instructions for updating: tf.py_func is deprecated in TF V2. Instead, there are two options available in V2. - tf.py_function takes a python function which manipulates tf eager tensors instead of numpy arrays. It's easy to convert a tf eager tensor to an ndarray (just call tensor.numpy()) but having access to eager tensors means tf.py_functions can use accelerators such as GPUs as well as being differentiable using a gradient tape. - tf.numpy_function maintains the semantics of the deprecated tf.py_func (it is not differentiable, and manipulates numpy arrays). It drops the stateful argument making all functions stateful.

INFO:tensorflow:Calling model_fn. INFO:tensorflow:Running train on CPU Model_Fn Shapes: (1024, 64, 2048) (1024, 10, 18) Features: Tensor("IteratorGetNext:0", shape=(1024, 64, 2048), dtype=float32) WARNING:tensorflow:Entity <bound method CNN_Encoder.call of <model.CNN_Encoder object at 0x000001ED90CD60F0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method CNN_Encoder.call of <model.CNN_Encoder object at 0x000001ED90CD60F0>>: AssertionError: Bad argument number for Name: 3, expecting 4 WARNING: Entity <bound method CNN_Encoder.call of <model.CNN_Encoder object at 0x000001ED90CD60F0>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method CNN_Encoder.call of <model.CNN_Encoder object at 0x000001ED90CD60F0>>: AssertionError: Bad argument number for Name: 3, expecting 4 ERROR:tensorflow:Error recorded from training_loop: unsupported operand type(s) for /: 'Dimension' and 'int', please use // instead INFO:tensorflow:training_loop marked as finished WARNING:tensorflow:Reraising captured error

TypeError Traceback (most recent call last) in 1 # TPUEstimator.train requires a max_steps argument. ----> 2 estimator.train(input_fn=input_fn, max_steps=1000)

D:\Anaconda3\envs\py363\lib\site-packages\tensorflow_estimator\python\estimator\tpu\tpu_estimator.py in train(self, input_fn, hooks, steps, max_steps, saving_listeners) 2874 finally: 2875 rendezvous.record_done('training_loop') -> 2876 rendezvous.raise_errors() 2877 2878 def evaluate(self,

D:\Anaconda3\envs\py363\lib\site-packages\tensorflow_estimator\python\estimator\tpu\error_handling.py in raise_errors(self, timeout_sec) 129 else: 130 logging.warn('Reraising captured error') --> 131 six.reraise(typ, value, traceback) 132 133 for k, (typ, value, traceback) in kept_errors:

D:\Anaconda3\envs\py363\lib\site-packages\six.py in reraise(tp, value, tb) 691 if value.traceback is not tb: 692 raise value.with_traceback(tb) --> 693 raise value 694 finally: 695 value = None

D:\Anaconda3\envs\py363\lib\site-packages\tensorflow_estimator\python\estimator\tpu\tpu_estimator.py in train(self, input_fn, hooks, steps, max_steps, saving_listeners) 2869 steps=steps, 2870 max_steps=max_steps, -> 2871 saving_listeners=saving_listeners) 2872 except Exception: # pylint: disable=broad-except 2873 rendezvous.record_error('training_loop', sys.exc_info())

D:\Anaconda3\envs\py363\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in train(self, input_fn, hooks, steps, max_steps, saving_listeners) 365 366 saving_listeners = _check_listeners_type(saving_listeners) --> 367 loss = self._train_model(input_fn, hooks, saving_listeners) 368 logging.info('Loss for final step: %s.', loss) 369 return self

D:\Anaconda3\envs\py363\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in _train_model(self, input_fn, hooks, saving_listeners) 1156 return self._train_model_distributed(input_fn, hooks, saving_listeners) 1157 else: -> 1158 return self._train_model_default(input_fn, hooks, saving_listeners) 1159 1160 def _train_model_default(self, input_fn, hooks, saving_listeners):

D:\Anaconda3\envs\py363\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in _train_model_default(self, input_fn, hooks, saving_listeners) 1186 worker_hooks.extend(input_hooks) 1187 estimator_spec = self._call_model_fn( -> 1188 features, labels, ModeKeys.TRAIN, self.config) 1189 global_step_tensor = training_util.get_global_step(g) 1190 return self._train_with_estimator_spec(estimator_spec, worker_hooks,

D:\Anaconda3\envs\py363\lib\site-packages\tensorflow_estimator\python\estimator\tpu\tpu_estimator.py in _call_model_fn(self, features, labels, mode, config) 2707 else: 2708 return super(TPUEstimator, self)._call_model_fn(features, labels, mode, -> 2709 config) 2710 else: 2711 return super(TPUEstimator, self)._call_model_fn(features, labels, mode,

D:\Anaconda3\envs\py363\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in _call_model_fn(self, features, labels, mode, config) 1144 1145 logging.info('Calling model_fn.') -> 1146 model_fn_results = self._model_fn(features=features, **kwargs) 1147 logging.info('Done calling model_fn.') 1148

D:\Anaconda3\envs\py363\lib\site-packages\tensorflow_estimator\python\estimator\tpu\tpu_estimator.py in _model_fn(features, labels, mode, config, params) 2965 logging.info('Running %s on CPU', mode) 2966 estimator_spec = model_fn_wrapper.call_without_tpu( -> 2967 features, labels, is_export_mode=is_export_mode) 2968 if (self._log_every_n_steps is not None 2969 or self._log_every_n_secs is not None):

D:\Anaconda3\envs\py363\lib\site-packages\tensorflow_estimator\python\estimator\tpu\tpu_estimator.py in call_without_tpu(self, features, labels, is_export_mode) 1547 1548 def call_without_tpu(self, features, labels, is_export_mode): -> 1549 return self._call_model_fn(features, labels, is_export_mode=is_export_mode) 1550 1551 def _add_embedding_features(self, features, hook_dummy_table_variables):

D:\Anaconda3\envs\py363\lib\site-packages\tensorflow_estimator\python\estimator\tpu\tpu_estimator.py in _call_model_fn(self, features, labels, is_export_mode) 1865 _add_item_to_params(params, _CTX_KEY, user_context) 1866 -> 1867 estimator_spec = self._model_fn(features=features, **kwargs) 1868 if (running_on_cpu and 1869 isinstance(estimator_spec, model_fn_lib._TPUEstimatorSpec)): # pylint: disable=protected-access

in model_fn(features, labels, mode, params) 52 optimizer = tf.train.AdamOptimizer() 53 optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer) ---> 54 loss, gradients, variables = trainer.train_fn(batch_size, features, labels) 55 train_op = optimizer.apply_gradients(zip(gradients, variables), tf.train.get_or_create_global_step()) 56

~\Desktop\ksardana\pycharm4\diagnose-and-explain-master\model.py in train_fn(self, batch_size, img_tensor, findings) 178 bwd_hidden = tf.zeros((batch_size, self.units)) 179 # Generate Findings --> 180 for i in range(int(findings.shape[1]/2)): # for each sentence in "findings" (each batch has a fixed # of sentences) 181 print("-------------------------------------i:", i) 182 loss, prev_sentence, fwd_hidden, bwd_hidden = self.train_word_decoder(batch_size, loss, features, findings, i, \

D:\Anaconda3\envs\py363\lib\site-packages\tensorflow\python\framework\tensor_shape.py in truediv(self, other) 530 """ 531 raise TypeError("unsupported operand type(s) for /: 'Dimension' and '{}', " --> 532 "please use // instead".format(type(other).name)) 533 534 def rtruediv(self, other):

TypeError: unsupported operand type(s) for /: 'Dimension' and 'int', please use // instead

ShivamPanchal avatar Sep 08 '19 10:09 ShivamPanchal

I have changed some lines of our code to and there is an empty folder named model, I provided

tf.logging.set_verbosity(tf.logging.INFO)
run_config = tf.contrib.tpu.RunConfig(model_dir='model')

estimator = tf.contrib.tpu.TPUEstimator(
        model_fn=model_fn,
        use_tpu=False,
        train_batch_size=1024,
        config=run_config
    )

can you please tell me, where is the issue.

ShivamPanchal avatar Sep 08 '19 10:09 ShivamPanchal