tfoptflow icon indicating copy to clipboard operation
tfoptflow copied to clipboard

bias not found in checkpoint

Open Banhalmi opened this issue 5 years ago • 5 comments

model: models/pwcnet-sm-6-2-multisteps-chairsthingsmix/pwcnet.ckpt-592000 gpu_devices = [] controller = '/device:CPU:0' windows 8.1 python 3.6 tensorflow 1.13 running: pwcnet_predict_from_img_pairs.py

full error output: tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key pwcnet/ctxt/dc_conv31/bias not found in checkpoint [[node save/RestoreV2 (defined at C:\PROJECTS\SASMOB - hídas projekt\optical_flow\tfoptflow-master\tfoptflow\model_base.py:119) ]]

Caused by op 'save/RestoreV2', defined at: File "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\Extensions\Microsoft\Python\Core\ptvsd_launcher.py", line 89, in vspd.debug(filename, port_num, debug_id, debug_options, run_as) File "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\Extensions\Microsoft\Python\Core\ptvsd\debugger.py", line 2631, in debug exec_file(file, globals_obj) File "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\Extensions\Microsoft\Python\Core\ptvsd\util.py", line 119, in exec_file exec_code(code, file, global_variables) File "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\Extensions\Microsoft\Python\Core\ptvsd\util.py", line 95, in exec_code exec(code_obj, global_variables) File "C:\PROJECTS\SASMOB - hídas projekt\optical_flow\tfoptflow-master\tfoptflow\pwcnet_predict_from_img_pairs.py", line 58, in nn = ModelPWCNet(mode='test', options=nn_opts) File "C:\PROJECTS\SASMOB - hídas projekt\optical_flow\tfoptflow-master\tfoptflow\model_pwcnet.py", line 231, in init super().init(name, mode, session, options) File "C:\PROJECTS\SASMOB - hídas projekt\optical_flow\tfoptflow-master\tfoptflow\model_base.py", line 66, in init self.build_graph() File "C:\PROJECTS\SASMOB - hídas projekt\optical_flow\tfoptflow-master\tfoptflow\model_base.py", line 266, in build_graph self.init_saver() File "C:\PROJECTS\SASMOB - hídas projekt\optical_flow\tfoptflow-master\tfoptflow\model_base.py", line 119, in init_saver self.saver = tf.train.Saver() File "C:\Users\BAndras\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 832, in init self.build() File "C:\Users\BAndras\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 844, in build self._build(self._filename, build_save=True, build_restore=True) File "C:\Users\BAndras\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 881, in _build build_save=build_save, build_restore=build_restore) File "C:\Users\BAndras\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 513, in _build_internal restore_sequentially, reshape) File "C:\Users\BAndras\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 332, in _AddRestoreOps restore_sequentially) File "C:\Users\BAndras\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 580, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "C:\Users\BAndras\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1655, in restore_v2 name=name) File "C:\Users\BAndras\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "C:\Users\BAndras\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func return func(*kwargs) File "C:\Users\BAndras\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3300, in create_op op_def=op_def) File "C:\Users\BAndras\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1801, in init self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key pwcnet/ctxt/dc_conv31/bias not found in checkpoint [[node save/RestoreV2 (defined at C:\PROJECTS\SASMOB - hídas projekt\optical_flow\tfoptflow-master\tfoptflow\model_base.py:119) ]]

output until error: C:\Users\BAndras\Anaconda3\lib\site-packages\h5py_init_.py:34: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating i s deprecated. In future, it will be treated as np.float64 == np.dtype(float).ty pe. from ._conv import register_converters as _register_converters Building model... WARNING:tensorflow:From C:\PROJECTS\SASMOB - hídas projekt\optical_flow\tfoptflo w-master\tfoptflow\model_pwcnet.py:1094: conv2d (from tensorflow.python.layers.c onvolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. WARNING:tensorflow:From C:\Users\BAndras\Anaconda3\lib\site-packages\tensorflow python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.fr amework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From C:\PROJECTS\SASMOB - hídas projekt\optical_flow\tfoptflo w-master\tfoptflow\model_pwcnet.py:1221: conv2d_transpose (from tensorflow.pytho n.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d_transpose instead. ... model built. Loading model checkpoint c:/PROJECTS/SASMOB - hídas projekt/optical_flow/tfoptfl ow-master/tfoptflow/models/pwcnet-sm-6-2-multisteps-chairsthingsmix/pwcnet.ckpt- 592000 for eval or testing...

WARNING:tensorflow:From C:\Users\BAndras\Anaconda3\lib\site-packages\tensorflow python\training\saver.py:1266: checkpoint_exists (from tensorflow.python.trainin g.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. INFO:tensorflow:Restoring parameters from c:/PROJECTS/SASMOB - hídas projekt/opt ical_flow/tfoptflow-master/tfoptflow/models/pwcnet-sm-6-2-multisteps-chairsthing smix/pwcnet.ckpt-592000 2019-03-28 12:13:28.455200: W tensorflow/core/framework/op_kernel.cc:1401] OP_RE QUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key pwcnet/ctxt/dc_conv 31/bias not found in checkpoint

Banhalmi avatar Mar 28 '19 12:03 Banhalmi

@Banhalmi Have you solved the problem? I met the same problem.

apxlwl avatar Aug 23 '19 01:08 apxlwl

@Banhalmi @wlguan When I use the pretrained model sm,I got the same error. So I change the model to lg, it can run normally. Maybr there is something wrong in sm model.

Can I ask you the speed when run lg model? I use the image with (436, 1024) , each pair cost about 0.1 s; when image reduce to (256, 340), it cost 8.5s for 150 pairs. How about yours?

dagongji10 avatar Sep 05 '19 03:09 dagongji10

@Banhalmi @philferriere Have you solved this problem? I met the same problem when I restore the model. What is strange is that the model can run and be restored normally a few months ago, but now it can not be restored.

liyunfei1994 avatar Sep 06 '19 10:09 liyunfei1994

Same problem here, I am using tensorflow 1.10.0 installed thorugh conda.

this might be useful, resnet 101 has the same problem, some people solved by deleting the folder and saving on clean folder https://github.com/tensorflow/models/issues/5003#issuecomment-485274023 .EDIT: I tried and it seems it is not the case for this small model.

jeffbaena avatar Sep 11 '19 03:09 jeffbaena

Well I've found solution, but seems like lg model is better for me. it took 0.08 sec for lg model with [384, 512] size, 0.06 for sm model with the same size, but the inference result was not satisfying.

If you still want to use sm model, you have to change the following nn_opts

nn_opts['use_dense_cx'] = False,
nn_opts['use_res_cx'] = False

It should be True for lg model, and False for sm model

yacaeh avatar Sep 19 '19 02:09 yacaeh