pytorch-UNet val

I have a new question. In my training results, the value of val_loss is very unstable, but the result of train_loss is good. What is the reason for this?

val_loss

0.121626744 0.064643869 1.067177767 1.446711094 1.281244414 0.085074004 0.033228467 0.07067455 0.057629818 0.058747965 1.917625544 0.041074573 0.15646584 1.15546266 1.469133962 0.040493928 1.874986998 1.004034303 0.048311408 0.100535154 1.64835466 0.054501295 0.055335191 0.10456412 0.276266118 0.068388779 0.715268846 0.098956134 2.583597354 0.538678182 1.748115192 0.060186028 1.108167116 0.309113808 0.101157197 0.124394018 0.066762745 0.055764281 0.460467963 2.378881313 0.066589679 0.471164338 0.078409711 0.065184287 0.086847927 0.101323177 0.069627192 1.949108887 0.56550861 1.750528685 1.873376575 1.489622217 0.994260068 0.066217561 0.065698464 0.072816557 0.073953711 0.071316366 0.421468448 0.85330223 0.076412463 1.056728009 0.092980409 0.074839294 0.63967285 1.866399809 1.322311701 0.07251732 0.076650151 0.075991224 0.592627899 0.178139341 0.874911083 0.086961034 2.873830932 0.082318357 0.086678351 0.09311628 0.110618592 0.797192226 0.08203448 0.144572049 0.083447114 0.103286757 0.089605071 0.136912736 0.149895869 0.079707029 0.091769812 0.987303428 0.093305247 0.089239203 1.865140941 1.418952992 0.174026518 0.094368358 0.095077354 0.629811309 0.090620203 1.349854706

train_loss

0.230814192 0.028987918 0.024291774 0.017069094 0.014445363 0.013398573 0.017434838 0.010788895 0.010943851 0.010186778 0.008584953 0.01008794 0.008166692 0.00812426 0.007191894 0.006891105 0.007163724 0.006250243 0.007437692 0.005759141 0.005480012 0.007539251 0.005302504 0.005195955 0.005868624 0.004928016 0.004689556 0.00476825 0.004551805 0.004410871 0.004312812 0.005318373 0.003946777 0.00389254 0.004088193 0.004916692 0.003619704 0.003698593 0.003596091 0.003551835 0.003599238 0.003506489 0.003349912 0.004076907 0.003196912 0.003090051 0.003097881 0.003223821 0.003074504 0.003073007 0.002930183 0.002931052 0.002878063 0.002818113 0.004682063 0.002902587 0.00261943 0.002597745 0.002628764 0.002631262 0.003391225 0.002469422 0.002454039 0.002499085 0.002512494 0.00248172 0.002464812 0.002398441 0.002394155 0.002346173 0.002301199 0.002282861 0.002268845 0.002215712 0.002189308 0.002250406 0.002511505 0.002021703 0.00206177 0.002084404 0.002078039 0.00203906 0.002020018 0.001995647 0.001981924 0.001943689 0.001929661 0.001958759 0.001993959 0.001822729 0.001834546 0.001916094 0.001810232 0.001773823 0.001764436 0.001750559 0.001739222 0.002303985 0.001640475 0.001577288

Nov 27 '19 12:11 15793723081

Unfortunately, I cannot tell you the exact reason for this because I don't have enough information, but seems like your model is overfitting. Do you use the same augmentation during validation as you use during training?

Nov 28 '19 08:11 cosmic-cortex

Question to @cosmic-cortex

Dear cosmic-cortex,

First of all, many thanks for sharing this implementation of the u-net. It seems to be a great deal of work.

Similar to 15793723081, I am also having problems with overfitting. The network seems no to generalize well.

I downloaded the "Kaggle Data Science Bowl 2018 nuclei detection challenge dataset". I used your program kaggle_dsb18_preprocessing.py" to prepare the images and the masks. From the 670 files of the data set, I used 450 images (with their respective masks) to train the network and 220 images for the validation test. Please note that I did not separate the fluorescence, brightfield, and histopathological images. I used them all together for the training.

For the network I used a depth=6 and a width=32 and ran it for 100 epochs. I think I am using augmentation, as I left these two lines:

tf_train = JointTransform2D(crop=crop, p_flip=0.5, color_jitter_params=None, long_mask=True)
tf_val = JointTransform2D(crop=crop, p_flip=0.5, color_jitter_params=None, long_mask=True)

Do you have any suggestions/comments on what to do?

Many thanks in advance.

Feb 07 '21 10:02 alejandrodelac

@alejandrodelac

Hi! I am going to be honest, the last time I trained a U-Net on this dataset was about three years ago, so I don't exactly remember what steps I took to make it generalize well.

If I remember correctly, these were the augmentation steps I used:

splitting up the images into 512 x 512 patches in advance
randomly cropping 256 x 256 patches
color_jitter_params=(0.1, 0.1, 0.1, 0.1)

Unfortunately, I don't remember the number of epochs at all. However, I used a learning rate scheduler and reduced the LR when the validation loss was plateauing.

Hope that helps!

Feb 15 '21 08:02 cosmic-cortex

pytorch-UNet pytorch-UNet copied to clipboard

val_loss

val_loss

train_loss

pytorch-UNet
pytorch-UNet copied to clipboard