multi-class-text-classification-cnn-rnn
multi-class-text-classification-cnn-rnn copied to clipboard
Training fails
Hi, I'm having this issue when I run training:
python3 train.py ./data/train.csv.zip ./training_config.json
CRITICAL:root:Accuracy on test set: 0.9971641706053186
Traceback (most recent call last):
File "train.py", line 161, in
I'll spend a bit of time tomorrow to see how t fix this problem.
Did you check the saved model directory? Looks like model-2700 doesn't exist.
os.rename(path, trained_dir + 'best_model.ckpt') FileNotFoundError: [Errno 2] No such file or directory: './checkpoints_1486165230/model-2700' -> './trained_results_1486165230/best_model.ckpt'
@jiegzhan yes, model-2700 files do exist. but there is no model-2700 file as such nor it's a directory:
ls -lrt ./checkpoints_1486165230/ total 71404 -rw-r--r-- 1 root root 1433 Feb 3 23:41 model-1600.index -rw-r--r-- 1 root root 13073080 Feb 3 23:41 model-1600.data-00000-of-00001 -rw-r--r-- 1 root root 1543734 Feb 3 23:41 model-1600.meta -rw-r--r-- 1 root root 1433 Feb 3 23:41 model-1700.index -rw-r--r-- 1 root root 13073080 Feb 3 23:41 model-1700.data-00000-of-00001 -rw-r--r-- 1 root root 1543734 Feb 3 23:41 model-1700.meta -rw-r--r-- 1 root root 1433 Feb 3 23:42 model-2200.index -rw-r--r-- 1 root root 13073080 Feb 3 23:42 model-2200.data-00000-of-00001 -rw-r--r-- 1 root root 1543734 Feb 3 23:42 model-2200.meta -rw-r--r-- 1 root root 1433 Feb 3 23:42 model-2400.index -rw-r--r-- 1 root root 13073080 Feb 3 23:42 model-2400.data-00000-of-00001 -rw-r--r-- 1 root root 1543734 Feb 3 23:42 model-2400.meta -rw-r--r-- 1 root root 1433 Feb 3 23:42 model-2700.index -rw-r--r-- 1 root root 13073080 Feb 3 23:42 model-2700.data-00000-of-00001 -rw-r--r-- 1 root root 241 Feb 3 23:42 checkpoint -rw-r--r-- 1 root root 1543734 Feb 3 23:42 model-2700.meta
I'll have a look if train.py didn't write something correctly or if os.rename command is incorrect.
python3 -c 'import tensorflow as tf; print(tf.version)'
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
0.12.1
My tensorflow version is 0.9, it only produce two training files.
-rw-r--r-- 1 root root 1433 Feb 3 23:42 model-2700.index -rw-r--r-- 1 root root 1543734 Feb 3 23:42 model-2700.meta -rw-r--r-- 1 root root 13073080 Feb 3 23:42 model-2700.data-00000-of-00001
The newer version has three training files, instead of two.
Do you get more files created in checkpoints directory? I see *.meta, *.index, .data- and checkpoint.
My tensorflow version is 0.9, it only produces two training files.
Found this: https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md
New checkpoint format becomes the default in tf.train.Saver. Old V1 checkpoints continue to be readable; controlled by the write_version argument, tf.train.Saver now by default writes out in the new V2 format. It significantly reduces the peak memory required and latency incurred during restore.
set up the write_version argument if you are in a hurry.
I will try to upgrade the tensorflow and make changes soon.
Thanks for pointing this out.
Yep, testing it with V1 now.
Yep, works fine with :
saver = tf.train.Saver(tf.all_variables(), write_version=tf.train.SaverDef.V1)
This is the how the warning message looks like:
WARNING:tensorflow:*******************************************************
WARNING:tensorflow:*******************************************************
WARNING:tensorflow:TensorFlow's V1 checkpoint format has been deprecated.
WARNING:tensorflow:TensorFlow's V1 checkpoint format has been deprecated.
WARNING:tensorflow:Consider switching to the more efficient V2 format:
WARNING:tensorflow:Consider switching to the more efficient V2 format:
WARNING:tensorflow: tf.train.Saver(write_version=tf.train.SaverDef.V2)
WARNING:tensorflow: tf.train.Saver(write_version=tf.train.SaverDef.V2)
WARNING:tensorflow:now on by default.
WARNING:tensorflow:now on by default.
WARNING:tensorflow:*******************************************************
WARNING:tensorflow:*******************************************************
Thanks
Hai Guys Any Solution for the above issue . If yes please reply.