Ultra-Light-Fast-Generic-Face-Detector-1MB icon indicating copy to clipboard operation
Ultra-Light-Fast-Generic-Face-Detector-1MB copied to clipboard

Is 200 epochs enough to converge?

Open eensdl opened this issue 3 years ago • 3 comments

I followed the instruction and upon 200 epoch the average loss is about 3.

I am curious if I have done something wrong?

Should the training converge within 200 epochs?

eensdl avatar Apr 22 '21 00:04 eensdl

Yes it should. Somehow Colab is giving me bad results. After changing from Colab to the local GPU the loss dropped to 2.6 in 110 epochs

eensdl avatar Apr 22 '21 08:04 eensdl

Hi, i have met the same problem with you. But i get the Regression Loss value is inf, the Classification Loss is normal.

2021-04-21 11:38:41,798 - root - INFO - Epoch: 195, Step: 100, Average Loss: 1.7795, Average Regression Loss 0.7859, Average Classification Loss: 0.9937 ............................................................................2021-04-21 11:38:55,192 - root - INFO - lr rate :0.0001 2021-04-21 11:38:55,193 - root - INFO - lr rate :0.0001 2021-04-21 11:38:57,230 - root - INFO - Epoch: 195, Validation Loss: inf, Validation Regression Loss inf, Validation Classification Loss: 1.0416 2021-04-21 11:38:57,241 - root - INFO - Saved model ./models/train-version-slim/slim-Epoch-195-Loss-inf.pth ...................................................................................................... 2021-04-21 11:39:17,186 - root - INFO - Epoch: 196, Step: 100, Average Loss: inf, Average Regression Loss inf, Average Classification Loss: 0.9936 ............................................................................2021-04-21 11:39:30,308 - root - INFO - lr rate :0.0001 ...................................................................................................... 2021-04-21 11:39:49,252 - root - INFO - Epoch: 197, Step: 100, Average Loss: 1.7731, Average Regression Loss 0.7626, Average Classification Loss: 1.0105 ............................................................................2021-04-21 11:40:02,683 - root - INFO - lr rate :0.0001 ...................................................................................................... 2021-04-21 11:40:22,183 - root - INFO - Epoch: 198, Step: 100, Average Loss: 1.7419, Average Regression Loss 0.7519, Average Classification Loss: 0.9900 ............................................................................2021-04-21 11:40:35,509 - root - INFO - lr rate :0.0001 ...................................................................................................... 2021-04-21 11:40:54,622 - root - INFO - Epoch: 199, Step: 100, Average Loss: 1.7601, Average Regression Loss 0.7651, Average Classification Loss: 0.9950 ............................................................................2021-04-21 11:41:08,277 - root - INFO - lr rate :0.0001 2021-04-21 11:41:08,277 - root - INFO - lr rate :0.0001 2021-04-21 11:41:10,293 - root - INFO - Epoch: 199, Validation Loss: inf, Validation Regression Loss inf, Validation Classification Loss: 1.0122 2021-04-21 11:41:10,305 - root - INFO - Saved model ./models/train-version-slim/slim-Epoch-199-Loss-inf.pth

robbie2021 avatar Apr 23 '21 06:04 robbie2021

Try downloading the project again.

eensdl avatar May 07 '21 07:05 eensdl