LOSS value can not be reduced during training
Hello, I use the following parameters to set my profile, the data used is the BRATS17 data set, the data size is 240240155. The problem is that the LOSS value is always around 0.8 when training. There was no significant drop when the 20,000 iterations were completed. How can I solve it?
[T1] path_to_search = /home/jzd/NiftyNet-11dev/jia/t1/HGG-t1, /home/jzd/NiftyNet-11dev/jia/t1/LGG-t1 filename_contains = t1 filename_not_contains = spatial_window_size = (64, 64, 64) interp_order = 3 axcodes=(A, R, S) [label] path_to_search = /home/jzd/NiftyNet-11dev/jia/t1/HGG-t1,/home/jzd/NiftyNet-11dev/jia/t1/LGG-t1 filename_contains = GlistrBoost filename_not_contains = spatial_window_size = (64, 64, 64) interp_order = 0 axcodes=(A, R, S) ############################## system configuration sections [SYSTEM] cuda_devices = "" num_threads = 2 num_gpus = 1 model_dir = /home/jzd/NiftyNet-11dev/jia/t1/model [NETWORK] name = vnet activation_function = prelu batch_size = 1 decay = 0.1 reg_type = L2
volume_padding_size = 0 histogram_ref_file = /home/jzd/NiftyNet-11dev/jia/t1/standardisation_models.txt norm_type = percentile cutoff = (0.01, 0.99) normalisation = False whitening = False normalise_foreground_only = True foreground_type = threshold_plus multimod_foreground_type = and
queue_length = 6 window_sampling = resize
[TRAINING] sample_per_volume = 100
rotation_angle = (-10.0, 10.0) scaling_percentage = (-10.0, 10.0) random_flipping_axes= 1
lr = 0.00001 loss_type = Dice starting_iter = 0 save_every_n = 500 max_iter = 20000 max_checkpoints = 2000
[INFERENCE] border = 0 inference_iter = -1 save_seg_dir = /home/jzd/NiftyNet-11dev/jia/t1/segout output_interp_order = 3 spatial_window_size = (64, 64, 64)
############################ custom configuration sections [SEGMENTATION] image = T1 label = label output_prob = True num_classes = 5 label_normalisation = True
I have the same problem with Dice loss/GDSC.
Maybe your window size is too small. Check my tutorial: https://gist.github.com/fepegar/1fb865494cb44ac043c3189ec415d411
Also, you can start by trying to fit only one image instead of the whole dataset? https://twitter.com/karpathy/status/1013244313327681536
I modified the file as per https://gist.github.com/fepegar/1fb865494cb44ac043c3189ec415d411, but a new error occurred:
ValueError: Dimension 1 in both shapes must be equal, but are 25 and 26 for 'worker_0/UNet /residual_concat/concat' (op: 'ConcatV2') with input shapes: [1,25,25,25,256], [1,26,26,26,512], [] and with computed input tensors: input[2] = < -1>.
My file parameters are as follows:
[T1]
path_to_search = /home/jzd/NiftyNet-11dev/jia/t1/HGG-t1, /home/jzd/NiftyNet-11dev/jia/t1/LGG-t1
filename_contains = t1
filename_not_contains =
spatial_window_size = (100, 100, 100)
interp_order = 3
pixdim=(1.0, 1.0, 1.0)
axcodes=(A, R, S)
[label]
path_to_search = /home/jzd/NiftyNet-11dev/jia/t1/HGG-t1,/home/jzd/NiftyNet-11dev/jia/t1/LGG-t1
filename_contains = GlistrBoost
filename_not_contains =
spatial_window_size = (100, 100, 100)
interp_order = 0
pixdim=(1.0, 1.0, 1.0)
axcodes=(A, R, S)
############################## system configuration sections
[SYSTEM]
cuda_devices = ""
num_threads = 2
num_gpus = 1
model_dir = /home/jzd/NiftyNet-11dev/jia/t1/model
[NETWORK]
name = unet
activation_function = prelu
batch_size = 1
decay = 0.1
reg_type = L2
# vlume level preprocessing
volume_padding_size = (42, 42, 42)
# histogram normalisation
histogram_ref_file = /home/jzd/NiftyNet-11dev/jia/t1/standardisation_models.txt
norm_type = percentile
cutoff = (0.01, 0.99)
normalisation = Flase
whitening = False
normalise_foreground_only = True
foreground_type = threshold_plus
multimod_foreground_type = and
queue_length = 6
window_sampling = resize
[TRAINING]
sample_per_volume = 100
rotation_angle = (-10.0, 10.0)
scaling_percentage = (-10.0, 10.0)
random_flipping_axes= 1
lr = 0.00001
loss_type = Dice
starting_iter = 0
save_every_n = 500
max_iter = 20000
max_checkpoints = 2000
[INFERENCE]
border = (42, 42, 42)
inference_iter = -1
save_seg_dir = /home/jzd/NiftyNet-11dev/jia/t1/segout
output_interp_order = 3
spatial_window_size = (100, 100 ,100)
#### [EVALUATION]
#### evaluations=Dice
############################ custom configuration sections
[SEGMENTATION]
image = T1
label = label
##### output_prob = Flase
num_classes = 5
label_normalisation = True
the input size has to be the multiplies of 8
Actually, using crossEntropy always works for my network but whenever I use the Dice loss or GDSC loss, the training loss just doesn't seem to decrease much no matter how many iterations I train. It may have to do with the learning rate but I'm not very sure.
@YilinLiu97 Thank you for your reply, I hope I have the opportunity to learn from your profile parameters, can you publish your files? Thank you
@YilinLiu97 sounds like a weight initialisation issue or perhaps you could use a truncated version of the Dice loss (https://github.com/NifTK/NiftyNet/issues/34#issuecomment-360842663)
I have the some problem. The loss is around 0.3 and cannot decrease. Have you solve it?@1160914483
@ChenxiCui97 Some methods were tried, such as replacing the loss function, increasing the number of iterations, changing the window value and the optimizer, but the effect was not obvious.Could you tell me the parameters of your configuration file for my reference by email? I am particularly interested in your LOSS=0.3.
@1160914483 I think this is because the data set we used are different. I have saw your configuration and there are some different. I use the window(128,128,120) and didn't use the rotation.The rest parameters are the same as yours.
@ChenxiCui97 Thank you for your reply.The gray scale of the data I used was too large, and maybe that would affect the results.I preprocess the data and hope it will help the results.
I have the same problem in my case, where I need to segment two foreground class from the background. The dice loss decreases to around 0.67. For me, the problem is the two foreground classes takes only a tiny portion of the total volume, thus leading to severe class imbalances. To calculate the dice loss, the dice coefficients for the three classes are ~1, ~0, ~0, so the dice loss is about 1 - 1/3 = 0.67.
In your case, you have 5 classes. I guess the 4 foreground classes also are small compared with background class. Thus they are not well segmented and the dice loss = 1 - 1/5 = 0.8.
Although the dice loss stays around ~0.67, I do observe at early stages, the two foreground classes are not shown in the results. And at later iterations, I'm able to observe them.
I have the same problem in my case, where I need to segment two foreground class from the background. The dice loss decreases to around 0.67. For me, the problem is the two foreground classes takes only a tiny portion of the total volume, thus leading to severe class imbalances. To calculate the dice loss, the dice coefficients for the three classes are ~1, ~0, ~0, so the dice loss is about 1 - 1/3 = 0.67.
In your case, you have 5 classes. I guess the 4 foreground classes also are small compared with background class. Thus they are not well segmented and the dice loss = 1 - 1/5 = 0.8.
Although the dice loss stays around ~0.67, I do observe at early stages, the two foreground classes are not shown in the results. And at later iterations, I'm able to observe them.
Ground truth segmentations from BraTS 2018 dataset have 4 classes (0, 1, 2, and 4) when inspecting those files with ITK-SNAP. The class 0 (black) is the background, whereas 1, 2, and 4 corresponds to the different tumour structures (core, etc.).
Just to make sure, is it correct then when the loss (Dice) is roughly around a certain value - in this case 1 - 1/4 = 0.75 - during training in each iteration even after 30k iterations?
Even when I use "CrossEntropy" as loss_type, the loss converges around a certain value but much higher (~ 1.25) than "Dice".