I'm new to neural architecture search. Thank you.
@guotong1988 I am facing this issue when using evolved transformer model.
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node evolved_transformer/parallel_0_5/evolved_transformer/evolved_transformer/body/encoder/layer_0/conv_branches/standard_conv_3x1/conv1d}}]]
[[evolved_transformer/parallel_0_5/evolved_transformer/evolved_transformer/body/decoder/layer_4/second_attend_to_encoder/multihead_attention/dot_product_attention/Max/_12719]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node evolved_transformer/parallel_0_5/evolved_transformer/evolved_transformer/body/encoder/layer_0/conv_branches/standard_conv_3x1/conv1d}}]]
0 successful operations.
0 derived errors ignored.
Have you faced this issue?
@minump TensorFlow version problem, you can google it.
@minump By the way, you should format your post. It is hard to read it.
@guotong1988 Thanks for the update. I was able to run other models in the same framework(tensorflow, CUDA, cudnn). Faced this issue only with evolved_transformer. Hence thought the issue with either the model or my implementation. Sry for the copy-paste of the error.
evolved_transformer has the conv op, your other models do not have the conv op.
I am facing this issue when using evolved transformer model,when i decoder the model.
2 root error(s) found.
(0) Not found: Key evolved_transformer/body/decoder/layer_0/first_attend_to_encoder/multihead_attention/k/kernel not found in checkpoint
[[node save/RestoreV2 (defined at /lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py:629) ]]
(1) Not found: Key evolved_transformer/body/decoder/layer_0/first_attend_to_encoder/multihead_attention/k/kernel not found in checkpoint
[[node save/RestoreV2 (defined at /lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py:629) ]]
[[save/RestoreV2/_1417]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'save/RestoreV2':
File "/bin/t2t-decoder", line 20, in
tf.app.run()
File "/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/lib/python3.7/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/bin/t2t-decoder", line 12, in main
t2t_decoder.main(argv)
File "/lib/python3.7/site-packages/tensor2tensor/bin/t2t_decoder.py", line 214, in main
decode(estimator, hp, decode_hp)
File "/lib/python3.7/site-packages/tensor2tensor/bin/t2t_decoder.py", line 99, in decode
checkpoint_path=FLAGS.checkpoint_path)
File "/lib/python3.7/site-packages/tensor2tensor/utils/decoding.py", line 477, in decode_from_file
for elapsed_time, result in timer(result_iter):
File "/lib/python3.7/site-packages/tensor2tensor/utils/decoding.py", line 471, in timer
item = next(gen)
File "/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 629, in predict
hooks=all_hooks) as mon_sess:
File "/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1038, in init
stop_grace_period_secs=stop_grace_period_secs)
File "/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 749, in init
self._sess = _RecoverableSession(self._coordinated_creator)
File "/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1231, in init
_WrappedSession.init(self, self._create_session())
File "/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1236, in _create_session
return self._sess_creator.create_session()
File "/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 902, in create_session
self.tf_sess = self._session_creator.create_session()
File "/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 660, in create_session
self._scaffold.finalize()
File "/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 235, in finalize
self._saver = training_saver._get_saver_or_default() # pylint: disable=protected-access
File "/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 607, in _get_saver_or_default
saver = Saver(sharded=True, allow_empty=True)
File "/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 836, in init
self.build()
File "/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 848, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 886, in _build
build_restore=build_restore)
File "/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 510, in _build_internal
restore_sequentially, reshape)
File "/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 389, in _AddShardedRestoreOps
name="restore_shard"))
File "/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 336, in _AddRestoreOps
restore_sequentially)
File "/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 583, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/lib/python3.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1524, in restore_v2
name=name)
File "/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 744, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3485, in _create_op_internal
op_def=op_def)
File "/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1949, in init
self._traceback = tf_stack.extract_stack()