texture_nets icon indicating copy to clipboard operation
texture_nets copied to clipboard

Training the model failed

Open puppet101 opened this issue 9 years ago • 5 comments

Hi, I tried to train my model based on the imagenet validation set which contains 50k images. At the beginning, i.e. the iteration step is smaller than some small numbers like 8000, I can get reasonable test result using the trained model. But as the training going on, I get a black image, where is output of the pixel value is NaN! All the parameters used for training are unchanged, and the training image is the " cezanne.jpg" which is included in the branch of texture_nets_v1. Shoud I change the learning late? Could you please give me some advice about this problem? Thanks!

puppet101 avatar Aug 03 '16 11:08 puppet101

I typically use 1e-3 lr for Johnson's model and 1e-2 (or higher) for others.

What batch size did you have? Probably there is a problem in dataloader and it does not change the epoch properly. I will take a look.

DmitryUlyanov avatar Aug 03 '16 12:08 DmitryUlyanov

The batch size is 1, all of the parameters are unchanged. Can you train any model correctly using this code? It seems that the training goes wrong suddenly~

When I change the pyramid model to Johnson's model, it works fine. So maybe there is some problem about the pyramid model.

puppet101 avatar Aug 04 '16 02:08 puppet101

It is probably something with learning rate, I tested both, I will take a look.

DmitryUlyanov avatar Aug 04 '16 15:08 DmitryUlyanov

Same issue for me. After enough iterations model returns black images.

sheerun avatar Aug 09 '16 14:08 sheerun

I cannot reproduce it, can you please specify your cmd?

DmitryUlyanov avatar Aug 12 '16 09:08 DmitryUlyanov