pytorch-UNet icon indicating copy to clipboard operation
pytorch-UNet copied to clipboard

val_loss

Open 15793723081 opened this issue 5 years ago • 3 comments

I have a new question. In my training results, the value of val_loss is very unstable, but the result of train_loss is good. What is the reason for this?

val_loss

0.121626744 0.064643869 1.067177767 1.446711094 1.281244414 0.085074004 0.033228467 0.07067455 0.057629818 0.058747965 1.917625544 0.041074573 0.15646584 1.15546266 1.469133962 0.040493928 1.874986998 1.004034303 0.048311408 0.100535154 1.64835466 0.054501295 0.055335191 0.10456412 0.276266118 0.068388779 0.715268846 0.098956134 2.583597354 0.538678182 1.748115192 0.060186028 1.108167116 0.309113808 0.101157197 0.124394018 0.066762745 0.055764281 0.460467963 2.378881313 0.066589679 0.471164338 0.078409711 0.065184287 0.086847927 0.101323177 0.069627192 1.949108887 0.56550861 1.750528685 1.873376575 1.489622217 0.994260068 0.066217561 0.065698464 0.072816557 0.073953711 0.071316366 0.421468448 0.85330223 0.076412463 1.056728009 0.092980409 0.074839294 0.63967285 1.866399809 1.322311701 0.07251732 0.076650151 0.075991224 0.592627899 0.178139341 0.874911083 0.086961034 2.873830932 0.082318357 0.086678351 0.09311628 0.110618592 0.797192226 0.08203448 0.144572049 0.083447114 0.103286757 0.089605071 0.136912736 0.149895869 0.079707029 0.091769812 0.987303428 0.093305247 0.089239203 1.865140941 1.418952992 0.174026518 0.094368358 0.095077354 0.629811309 0.090620203 1.349854706

train_loss

0.230814192 0.028987918 0.024291774 0.017069094 0.014445363 0.013398573 0.017434838 0.010788895 0.010943851 0.010186778 0.008584953 0.01008794 0.008166692 0.00812426 0.007191894 0.006891105 0.007163724 0.006250243 0.007437692 0.005759141 0.005480012 0.007539251 0.005302504 0.005195955 0.005868624 0.004928016 0.004689556 0.00476825 0.004551805 0.004410871 0.004312812 0.005318373 0.003946777 0.00389254 0.004088193 0.004916692 0.003619704 0.003698593 0.003596091 0.003551835 0.003599238 0.003506489 0.003349912 0.004076907 0.003196912 0.003090051 0.003097881 0.003223821 0.003074504 0.003073007 0.002930183 0.002931052 0.002878063 0.002818113 0.004682063 0.002902587 0.00261943 0.002597745 0.002628764 0.002631262 0.003391225 0.002469422 0.002454039 0.002499085 0.002512494 0.00248172 0.002464812 0.002398441 0.002394155 0.002346173 0.002301199 0.002282861 0.002268845 0.002215712 0.002189308 0.002250406 0.002511505 0.002021703 0.00206177 0.002084404 0.002078039 0.00203906 0.002020018 0.001995647 0.001981924 0.001943689 0.001929661 0.001958759 0.001993959 0.001822729 0.001834546 0.001916094 0.001810232 0.001773823 0.001764436 0.001750559 0.001739222 0.002303985 0.001640475 0.001577288

15793723081 avatar Nov 27 '19 12:11 15793723081

Unfortunately, I cannot tell you the exact reason for this because I don't have enough information, but seems like your model is overfitting. Do you use the same augmentation during validation as you use during training?

cosmic-cortex avatar Nov 28 '19 08:11 cosmic-cortex

Question to @cosmic-cortex

Dear cosmic-cortex,

First of all, many thanks for sharing this implementation of the u-net. It seems to be a great deal of work.

Similar to 15793723081, I am also having problems with overfitting. The network seems no to generalize well.

I downloaded the "Kaggle Data Science Bowl 2018 nuclei detection challenge dataset". I used your program kaggle_dsb18_preprocessing.py" to prepare the images and the masks. From the 670 files of the data set, I used 450 images (with their respective masks) to train the network and 220 images for the validation test. Please note that I did not separate the fluorescence, brightfield, and histopathological images. I used them all together for the training.

For the network I used a depth=6 and a width=32 and ran it for 100 epochs. I think I am using augmentation, as I left these two lines:

tf_train = JointTransform2D(crop=crop, p_flip=0.5, color_jitter_params=None, long_mask=True)
tf_val = JointTransform2D(crop=crop, p_flip=0.5, color_jitter_params=None, long_mask=True)

Do you have any suggestions/comments on what to do?

Many thanks in advance.

alejandrodelac avatar Feb 07 '21 10:02 alejandrodelac

@alejandrodelac

Hi! I am going to be honest, the last time I trained a U-Net on this dataset was about three years ago, so I don't exactly remember what steps I took to make it generalize well.

If I remember correctly, these were the augmentation steps I used:

  • splitting up the images into 512 x 512 patches in advance
  • randomly cropping 256 x 256 patches
  • color_jitter_params=(0.1, 0.1, 0.1, 0.1)

Unfortunately, I don't remember the number of epochs at all. However, I used a learning rate scheduler and reduced the LR when the validation loss was plateauing.

Hope that helps!

cosmic-cortex avatar Feb 15 '21 08:02 cosmic-cortex