SRGAN-tensorflow icon indicating copy to clipboard operation
SRGAN-tensorflow copied to clipboard

Error while training

Open nickhdfan opened this issue 7 years ago • 4 comments
trafficstars

Ok, say I want to train SRGAN, not SRResnet, the checkpoint here points to ./experiment_SRGAN_MSE/model-500000 but it doesn't exist! Is it created when we finished training the SRResnet?

I got this error : KeyError: 'vgg19_1/vgg_19/conv5/conv5_4' How do I resolve it?

nickhdfan avatar Feb 04 '18 02:02 nickhdfan

I have mentioned in README.md. If you want to go through the complete training process, you need to train SRResnet first and then use it as the pre-trained weight for SRGAN. If you want to train SRGAN directly, you can modify the train_SRGAN.sh to train it without using pre-trained weights.

Regarding the second question, where did you encounter it?

brade31919 avatar Feb 04 '18 15:02 brade31919

I encounter it when training SRGAN with MSE loss, and I haven't finished training SRResnet yet, SRResnet took 9000 minutes.

nickhdfan avatar Feb 04 '18 20:02 nickhdfan

Concerning the resume of training process, I got this error:

Loading model from the checkpoint... TypeError: expected bytes, NoneType found

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "main.py", line 315, in saver.restore(sess, checkpoint)

I tried to resume on my instances running CUDA 9 & CUDNN 7 which is the Hardware used to train and create the checkpoint file and I can successfully train... How to know whether the training is resumed or not?

nickhdfan avatar Feb 06 '18 04:02 nickhdfan

I got this error : KeyError: 'vgg19_1/vgg_19/conv5/conv5_4'

Issue #6

it will show you how to modify model.py to use the correct key

ryancom16 avatar Apr 21 '18 22:04 ryancom16