squeezeDet icon indicating copy to clipboard operation
squeezeDet copied to clipboard

Final model checkpoint is not saved even after running for many hours.

Open muthiyanbhushan opened this issue 6 years ago • 5 comments

Hello,

I am trying to run the model for 100000 steps.

The model checkpoint saved has 3 files.

  1. model.ckpt-99999.data-00000-of-00001
  2. model.ckpt-99999.index
  3. model.ckpt-99999.meta

But the final checkpoint value is not being stored like you have "model.ckpt-87000".

Can you please let me know how can I get this checkpoint file ?

Thanks.

muthiyanbhushan avatar Mar 06 '18 16:03 muthiyanbhushan

Hello Bichen,

Can you please let me know what might be the issue for not getting the final weights?

Thanks.

muthiyanbhushan avatar Mar 07 '18 03:03 muthiyanbhushan

@muthiyanbhushan I am also confuse about this question, have you solved that ? I simply delete .index in model.ckpt-99999.index but it seems not working properly.

eleboss avatar Mar 19 '18 05:03 eleboss

and that is my error

tensorflow.python.framework.errors_impl.NotFoundError: Tensor name "fire7/squeeze1x1/biases" not found in checkpoint files ./data/model_checkpoints/squeezeDet/model.ckpt-999 [[Node: save/RestoreV2_50 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_50/tensor_names, save/RestoreV2_50/shape_and_slices)]] [[Node: save/RestoreV2_51/_63 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:1", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_257_save/RestoreV2_51", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:1"]]

really dont understand why it said missiing a layer

eleboss avatar Mar 19 '18 05:03 eleboss

(update) Through learning tensorflow, I found that Tensorflow saved the checkpoint files in seperate files, and we dont need to care about how many files, just read it. To me, I just copy all files

model.ckpt-99999.data-00000-of-00001
model.ckpt-99999.index
model.ckpt-99999.meta

and use model.ckpt-99999 to read. that solved

eleboss avatar Mar 19 '18 06:03 eleboss

@eleboss,

I am also following similar procedure for now. Thanks for your response.

muthiyanbhushan avatar Mar 19 '18 19:03 muthiyanbhushan