Adversarial_Video_Generation icon indicating copy to clipboard operation
Adversarial_Video_Generation copied to clipboard

Tensor name not found in checkpoint file

Open newuhe opened this issue 7 years ago • 12 comments

Hello,I'm trying to use your trained model to predict one frame on your dataset,however I encountered this problem. NotFoundError (see above for traceback): Tensor name "generator/scale_3/setup/Variable_5/optimizer" not found in checkpoint files ../Models/Adversarial/model.ckpt-500000 [[Node: save/RestoreV2_153 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_153/tensor_names, save/RestoreV2_153/shape_and_slices)]] [[Node: save/RestoreV2_63/_147 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_651_save/RestoreV2_63", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

newuhe avatar May 17 '17 12:05 newuhe

Hey, sorry for the delay. Which version of TensorFlow are you using? This only works for up to version 0.12 currently.

dyelax avatar May 29 '17 19:05 dyelax

After fixing the functions in my other issue (https://github.com/dyelax/Adversarial_Video_Generation/issues/15), I receive the same output as OP. @dyelax do you think that the problem arise only when using pre-trained model? Do you think I can train the network again and run the code?

fverdoja avatar Jun 06 '17 10:06 fverdoja

I've just finished re-training the network on the Ms Pacman dataset and everything seems to work. If you want, I can share the new trained model, which should be compatible with current installations of tensorflow.

fverdoja avatar Jun 07 '17 08:06 fverdoja

@fverdoja Glad you got it working! Yes, would be great to have your trained model. Does it load with the current loading code in this project?

dyelax avatar Jun 08 '17 02:06 dyelax

@dyelax The training went well all the way. I just tried to load the model, but sadly it gives the same error as OP. Maybe the way the model is saved is not correct anymore? I can upload the trained model anyway if you want, so you can try to look into the problem maybe a little better than how I could.

fverdoja avatar Jun 10 '17 17:06 fverdoja

Which version of TensorFlow are you using?

dyelax avatar Jun 10 '17 17:06 dyelax

1.1.0 if I recall correctly.

fverdoja avatar Jun 10 '17 17:06 fverdoja

I haven't updated the repo to v1.1.0 yet. This only works for up to v0.12. Does the model loading work in the pull request you made to update the repo to 1.1.0?

dyelax avatar Jun 10 '17 17:06 dyelax

Nope, with the code in my pull request training works, but loading doesn't.

fverdoja avatar Jun 10 '17 18:06 fverdoja

Ok, one of my thesists figured out what the problem was. I, and I imagine OP as well, was loading the model using the following: python avg_runner.py -l ../Save/Models/Default/model.ckpt-1000000.index while the way the model is saved in TF1+, requires you to use the load function without extension... so when calling: python avg_runner.py -l ../Save/Models/Default/model.ckpt-1000000 everything seem to be working.

So I think you can safely merge my pull request. Everything works, just be aware that the model has to be called without extension.

fverdoja avatar Jun 28 '17 17:06 fverdoja

Here is a link to the trained model on TF1.1: https://drive.google.com/drive/folders/0B83QXMRRjnSaYzJmQS1TWkZYMkU?usp=sharing

fverdoja avatar Jun 30 '17 09:06 fverdoja

It's truly the tensorflow version problem,thanks for help.

newuhe avatar Jul 13 '17 03:07 newuhe