DICL-Flow icon indicating copy to clipboard operation
DICL-Flow copied to clipboard

Overfitting and difference in results compared to the pretrained checkpoint

Open prajnan93 opened this issue 2 years ago • 1 comments

Hi,

I am trying to replicate the training results by following the steps mentioned in the README. However, the model is overfitting after training 3 stages of chairs and then evaluating on Sintel.

I have downloaded the pretrained checkpoint for chairs provided in the repository and compared the EPEs to our trained model. And here are the results:

Chairs Sintel Clean Sintel Final
pretrained 2.56 4.63 6.07
ours 1.56 7.76 15.58

As we can see from the table, our model is clearly overfitting on chairs. Could you please help me to replicate the training in order to match the results with the pretrained model?

I have also noticed that the paper mentions a total of 150K iterations on FlyingChairs and the README mentions 120 epochs on each stage of FlyingChairs. This results in a total of 360 epochs which is more than 150K iterations. Here is the EPE on chairs after each stage of training:

stage 0 stage 1 stage 2
2.16 1.65 1.56

Could you please provide the correct number of epochs for each stage of chairs?

prajnan93 avatar May 23 '22 13:05 prajnan93

Hi @prajnan93 ,

Thank you for the interest! FlyingChairs dataset has 22232 training images, and we train the model with a batch size with 64 on it, as shown on the Readme page. Therefore, each epoch of FlyingChairs corresponds to 22232/64 ≈ 350 = 0.35k iterations. In this way, 360 epochs consume around 130k iterations, which is smaller than 150k.

Regarding the overfitting problem, are you using a batch size of 64? If with a smaller batch size, we need to decrease the learning rate correspondingly. For example, if using a bs of 16 instead of 64, we should empirically reduce the lr to 0.25 of it.

Best, Jianyuan

jytime avatar Aug 21 '22 21:08 jytime