nl2bash icon indicating copy to clipboard operation
nl2bash copied to clipboard

make train

Open cjy-cc opened this issue 4 years ago • 0 comments

Hi sir, I think your code is very meaningful and I want to reproduce it, but I have a problem and want to trouble you. When I was training, I found that the following problems occurred.

`Initialize the graph with random parameters. bucket 0: (10, 23) (3463) bucket 1: (14, 23) (3396) bucket 2: (28, 23) (2954) Epoch 1 0%| | 0/4000 [00:00<?, ?it/s]2020-04-02 09:59:41.480441: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.485770: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.498355: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.502850: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.507510: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.510871: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.514412: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.517675: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.520942: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.523802: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.527851: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.530873: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.534375: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.538748: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.543058: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.546131: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.549882: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.553088: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.556465: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.559658: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.563791: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.566951: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.570187: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.573230: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.576664: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.580121: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.583457: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.586287: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.589678: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.592683: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.596938: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.599845: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.604331: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.607419: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.610917: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.613959: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.617808: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.620678: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.624038: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.627022: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.630438: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.634057: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.638341: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.642178: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.645559: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-02 09:59:41.648711: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR Traceback (most recent call last): File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node token_decoder_decoder_rnn_2/Attention_0/Conv2D}}]] [[add_5/_849]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node token_decoder_decoder_rnn_2/Attention_0/Conv2D}}]] 0 successful operations. 0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/cc/anaconda3/envs/a/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/cc/anaconda3/envs/a/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/cc/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/translate.py", line 378, in tf.compat.v1.app.run() File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "/home/cc/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/translate.py", line 353, in main train(train_set, dataset) File "/home/cc/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/translate.py", line 95, in train sess, formatted_example, bucket_id, forward_only=False) File "/home/cc/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/framework.py", line 631, in step outputs = session.run(output_feed, input_feed) File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node token_decoder_decoder_rnn_2/Attention_0/Conv2D (defined at /anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1751) ]] [[add_5/_849]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node token_decoder_decoder_rnn_2/Attention_0/Conv2D (defined at /anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1751) ]] 0 successful operations. 0 derived errors ignored.

Original stack trace for 'token_decoder_decoder_rnn_2/Attention_0/Conv2D': File "/anaconda3/envs/a/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/anaconda3/envs/a/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/translate.py", line 378, in tf.compat.v1.app.run() File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/anaconda3/envs/a/lib/python3.7/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/anaconda3/envs/a/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/translate.py", line 353, in main train(train_set, dataset) File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/translate.py", line 64, in train model = define_model(sess, forward_only=False, buckets=train_set.buckets) File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/translate.py", line 53, in define_model FLAGS, session, Seq2SeqModel, buckets, forward_only) File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/graph_utils.py", line 142, in define_model model = model_constructor(params, buckets) File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/seq2seq/seq2seq_model.py", line 28, in init super(Seq2SeqModel, self).init(hyperparams, buckets) File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/framework.py", line 71, in init self.define_graph() File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/framework.py", line 140, in define_graph encoder_copy_inputs=self.encoder_copy_inputs[:bucket[0]] File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/framework.py", line 256, in encode_decode encoder_copy_inputs=encoder_copy_inputs) File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/seq2seq/rnn_decoder.py", line 199, in define_graph decoder_cell(input_embedding, state) File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/decoder.py", line 240, in call attns, alignments = self.attention(cell_output) File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/decoder.py", line 211, in attention input_tensor=l * tf.tanh(tf.nn.conv2d(input=v, filters=k, strides=[1,1,1,1], padding="SAME")), axis=[2, 3]) File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 1913, in conv2d_v2 name=name) File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 2010, in conv2d name=name) File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 1071, in conv2d data_format=data_format, dilations=dilations, name=name) File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 793, in _apply_op_helper op_def=op_def) File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3360, in create_op attrs, op_def, compute_device) File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3429, in _create_op_internal op_def=op_def) File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1751, in init self._traceback = tf_stack.extract_stack()

0%| | 0/4000 [00:25<?, ?it/s] Makefile:41: recipe for target 'train' failed make: *** [train] Error 1 `

cjy-cc avatar Apr 02 '20 02:04 cjy-cc