DIGITS
DIGITS copied to clipboard
Loading pretrained tensorflow model
Using DIGITS-6.0-rc, with the new tensorflow support, I am trying to port nets from TF-slim into DIGITS. I have implemented InceptionV1, and it trains from scratch in DIGITS, however when trying to load a pretrained model I get the following error.
2017-08-22 11:22:00 [INFO] Train batch size is 16 and validation batch size is 16
2017-08-22 11:22:00 [INFO] Training epochs to be completed for each validation : 1
2017-08-22 11:22:00 [INFO] Training epochs to be completed before taking a snapshot : 1.0
2017-08-22 11:22:00 [INFO] Model weights will be saved as snapshot_<EPOCH>_Model.ckpt
2017-08-22 11:22:00 [INFO] Loading mean tensor from /jobs/20170613-080203-895a/mean.binaryproto file
2017-08-22 11:22:00 [INFO] Loading label definitions from /jobs/20170613-080203-895a/labels.txt file
2017-08-22 11:22:00 [INFO] Found 69 classes
2017-08-22 11:22:00 [INFO] Found 4189 images in db /jobs/20170613-080203-895a/train_db
2017-08-22 11:22:00.593158: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-22 11:22:00.593188: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-22 11:22:00.593201: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-22 11:22:00.593213: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-22 11:22:00.593226: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-08-22 11:22:00.836922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:04:00.0
Total memory: 3.94GiB
Free memory: 3.88GiB
2017-08-22 11:22:00.836979: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-08-22 11:22:00.836997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2017-08-22 11:22:00.837029: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:04:00.0)
2017-08-22 11:22:02 [INFO] Optimizer:sgd
2017-08-22 11:22:03 [INFO] Found 1429 images in db /jobs/20170613-080203-895a/val_db
2017-08-22 11:22:03.471877: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:04:00.0)
2017-08-22 11:22:03.973740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:04:00.0)
2017-08-22 11:22:04 [INFO] Loading weights from pretrained model - /path/to/ckpt/inception_v1.ckpt
2017-08-22 11:22:05 [INFO] NOT restoring global_step -> global_step:0
2017-08-22 11:22:05 [INFO] Restoring 0 variable ops.
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/digits/tools/tensorflow/main.py", line 707, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/lib/python2.7/dist-packages/digits/tools/tensorflow/main.py", line 544, in main
load_snapshot(sess, FLAGS.weights, tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES))
File "/usr/local/lib/python2.7/dist-packages/digits/tools/tensorflow/main.py", line 264, in load_snapshot
tf.train.Saver(vars_restore, max_to_keep=0, sharded=FLAGS.serving_export).restore(sess, weight_path)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1139, in __init__
self.build()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1161, in build
raise ValueError("No variables to save")
ValueError: No variables to save
Would it be possible to add support for restoring from non-DIGITS checkpoints?
Tensorflow is a bit finnicky like that in a way that the weights are saved not in 1 but in 3 files. The latest commit on the master branch now takes in account of that and when you upload, it will process all the .ckpt files (data, index, meta). But by reference, it will still refer to .ckpt. Basically when you are loading a pre trained weight, ensure that the extension is just .ckpt and the backend will search for all those 3 files.
Does this still stand for the released v6.0.0? I see only options to upload Torch or Caffe pre-trained models.
You should be able to upload Tensorflow pre-trained models now in V6
If it's meant to be alongside the options for Torch and Caffe, it doesn't seem to be there. Here's a screenshot of my options under "Pretrained models" > "Upload...", for Digits v6.0.0 (installed from the docker hub image: nvidia/digits:latest, which currently points to :6.0.0).
I'd expect a third "Tensorflow" option to enable loading a checkpoint. Is this incorrect? (And if so, how should one load the tensorflow pre-trained model?)
Thanks for your help!
Seem issue. Wonder Digits when to support pre-trained tensorflow model?
Hi. I am uploading a pretrained network on DIGITS .
And I am getting this error
Can I get some help on this
Hi. I am uploading a pretrained network on DIGITS .
And I am getting this error
Can I get some help on this
you should add a lables.txt files. the file can be empty, but must have. this can solve you problem
If it's meant to be alongside the options for Torch and Caffe, it doesn't seem to be there. Here's a screenshot of my options under "Pretrained models" > "Upload...", for Digits v6.0.0 (installed from the docker hub image: nvidia/digits:latest, which currently points to :6.0.0).
I'd expect a third "Tensorflow" option to enable loading a checkpoint. Is this incorrect? (And if so, how should one load the tensorflow pre-trained model?)
Thanks for your help!
Any updates to this? I am seeing the same two options when trying to manually upload a pretrained Tensorflow model.