3DUX-Net
3DUX-Net copied to clipboard
ValueError: y_pred and y should have same shapes
Hello , because my label is very small, I modify the data transformation method of flare data, change source_key="image" in CropForegroundd to source_key='label', when I trained, the shapes of y and y_pred are inconsistent. How should I solve this problem?
elif dataset == 'flare':
train_transforms = Compose(
[
LoadImaged(keys=["image", "label"]),
AddChanneld(keys=["image", "label"]),
Spacingd(keys=["image", "label"], pixdim=(
1.0, 1.0, 1.2), mode=("bilinear", "nearest")),
# ResizeWithPadOrCropd(keys=["image", "label"], spatial_size=(256,256,128), mode=("constant")),
Orientationd(keys=["image", "label"], axcodes="RAS"),
ScaleIntensityRanged(
keys=["image"], a_min=-125, a_max=275,
b_min=0.0, b_max=1.0, clip=True,
),
CropForegroundd(keys=["image", "label"], source_key='label', select_fn=lambda x: x > 0, margin=0),
RandCropByPosNegLabeld(
keys=["image", "label"],
label_key="label",
spatial_size=(96, 96, 96),
pos=1,
neg=1,
num_samples=crop_samples,
allow_smaller=True,
),
RandShiftIntensityd(
keys=["image"],
offsets=0.10,
prob=0.50,
),
RandAffined(
keys=['image', 'label'],
mode=('bilinear', 'nearest'),
prob=1.0, spatial_size=(96, 96, 96),
rotate_range=(0, 0, np.pi / 30),
scale_range=(0.1, 0.1, 0.1)),
ToTensord(keys=["image", "label"]),
]
)
val_transforms = Compose(
[
LoadImaged(keys=["image", "label"]),
AddChanneld(keys=["image", "label"]),
Spacingd(keys=["image", "label"], pixdim=(
1.0, 1.0, 1.2), mode=("bilinear", "nearest")),
Orientationd(keys=["image", "label"], axcodes="RAS"),
ScaleIntensityRanged(
keys=["image"], a_min=-125, a_max=275,
b_min=0.0, b_max=1.0, clip=True,
),
CropForegroundd(keys=["image", "label"], source_key='label', select_fn=lambda x: x > 0, margin=0),
ToTensord(keys=["image", "label"]),
]
)
test_transforms = Compose(
[
LoadImaged(keys=["image"]),
AddChanneld(keys=["image"]),
Spacingd(keys=["image"], pixdim=(
1.0, 1.0, 1.2), mode=("bilinear")),
# ResizeWithPadOrCropd(keys=["image"], spatial_size=(168,168,128), mode=("constant")),
Orientationd(keys=["image"], axcodes="RAS"),
ScaleIntensityRanged(
keys=["image"], a_min=-125, a_max=275,
b_min=0.0, b_max=1.0, clip=True,
),
CropForegroundd(keys=["image"], source_key="image"),
ToTensord(keys=["image"]),
]
)
The error that occurs is as follows:
Traceback (most recent call last):
File "/home/3DUX-Net/main_train.py", line 257, in <module>
global_step, dice_val_best, global_step_best = train(
File "/home/3DUX-Net/main_train.py", line 217, in train
dice_val = validation(epoch_iterator_val)
File "/home/3DUX-Net/main_train.py", line 170, in validation
dice_metric(y_pred=val_output_convert, y=val_labels_convert)
File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/metrics/metric.py", line 329, in __call__
ret = super().__call__(y_pred=y_pred, y=y)
File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/metrics/metric.py", line 68, in __call__
return self._compute_list(y_pred, y)
File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/metrics/metric.py", line 90, in _compute_list
ret = [self._compute_tensor(p.detach().unsqueeze(0), y_.detach().unsqueeze(0)) for p, y_ in zip(y_pred, y)]
File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/metrics/metric.py", line 90, in <listcomp>
ret = [self._compute_tensor(p.detach().unsqueeze(0), y_.detach().unsqueeze(0)) for p, y_ in zip(y_pred, y)]
File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/metrics/meandice.py", line 81, in _compute_tensor
return compute_meandice(
File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/metrics/meandice.py", line 136, in compute_meandice
raise ValueError(f"y_pred and y should have same shapes, got {y_pred.shape} and {y.shape}.")
ValueError: y_pred and y should have same shapes, got torch.Size([1, 2, 138, 62, 145]) and torch.Size([1, 2, 138, 62, 130]).
Hi, sorry for the late reply from the previous thread. When I look into the screenshot of both the image and label, seems like they are not in the same dimensions. Can you first make sure that they have same size and can overlay to each other? The above error is telling that your predicted output do not have the same size with your label. As the predicted output is having the same size with the input image, so the size of input image and the corresponding label are not matched.
Hi, sorry for the late reply from the previous thread. When I look into the screenshot of both the image and label, seems like they are not in the same dimensions. Can you first make sure that they have same size and can overlay to each other? The above error is telling that your predicted output do not have the same size with your label. As the predicted output is having the same size with the input image, so the size of input image and the corresponding label are not matched.
Ok, I checked the image and label size of the validation set, they are both the same size and can overlap each other
It can pass the training normally, but when it enters the validation to calculate dice_metric, it reports an error
yes, because dice is to compute the overlapping ratio between the predictions and the corresponding ground truth label. If the image dimension is 387 x 387 x 491, the output should also have the same size of 387 x 387 x 491, instead of 138 x 62 x 145.
If your label is too small, please use monai.transforms.RandCropd. More details can be found here: https://docs.monai.io/en/stable/transforms.html
yes, because dice is to compute the overlapping ratio between the predictions and the corresponding ground truth label. If the image dimension is 387 x 387 x 491, the output should also have the same size of 387 x 387 x 491, instead of 138 x 62 x 145.
If your label is too small, please use monai.transforms.RandCropd. More details can be found here: https://docs.monai.io/en/stable/transforms.html
Haha, I found out where the mistake is, the number of my labels is different from the number of images. I'm training normally now. looking forward to training results
Great, let's see if the results are making sense or not. Feel free to ask questions here and I will try to reply you ASAP.
Great, let's see if the results are making sense or not. Feel free to ask questions here and I will try to reply you ASAP.
Haha, ok, thank you very much for your patience in answering my question, the result when batchsize=1 is not very good (Best Avg. Dice is only 0.68+), I will now set the batchsize to 4, and see if the training result will be good
Seems like you are segmenting lesions, which is pretty small, maybe using a small kernel size is better, you may try kernel size = 5.
Seems like you are segmenting lesions, which is pretty small, maybe using a small kernel size is better, you may try kernel size = 5. Yes, I am segmenting the nodules, and some nodules are very small. This is the loss curve when I set the batchsize to 4
In the UXNET class of network.backbone.py, set the kernel_size of all encoders and decoders to 2? As shown in the figure below
For the encoder here, it is used to transfer the high level details for decoding, instead of focusing on getting meaningful features. The meaningful features are extracted in the convolution block (ux_block) in uxnet_encoder.py.
I am wondering how many samples do you use for training, validation and testing?
For the encoder here, it is used to transfer the high level details for decoding, instead of focusing on getting meaningful features. The meaningful features are extracted in the convolution block (ux_block) in uxnet_encoder.py.
I am wondering how many samples do you use for training, validation and testing?
Ok, is the kernel_size here changed from 7 to 5?
I use 631 cases of 3D data as the training set, 70 cases in the validation set, and 230 cases in the test set
right, the kernel_size changes to 5.
Also, you have 230 cases in the test set, maybe a lot of outliers there, as the groundtruth label is so small.
right, the kernel_size changes to 5.
OK, I'll try changing it to 5 later
Also, you have 230 cases in the test set, maybe a lot of outliers there, as the groundtruth label is so small.
Yes, the test set may have a lot of outliers, and some lesions may only be about 3mm in diameter. I haven’t used the test set yet, and I will test it after training.
I set the batchsize to 4, and the loss curve is not very ideal
Yes, you can see your training curve is pretty fluctuated, which means it is not easy to learn and maybe a lot of variations exist in your training samples. Instead of just changing the batch size, you also need to decrease the learning rate.
However, it is just hyperparameter tuning. If you want to make things more efficient, you may need to look deeply into the data and select the data that is good for training, instead of directly throwing 600 scans for training. More data doesn't mean it is learnable for models. First "artificial", then "intelligence".
Yes, you can see your training curve is pretty fluctuated, which means it is not easy to learn and maybe a lot of variations exist in your training samples. Instead of just changing the batch size, you also need to decrease the learning rate.
Yes, I just ran batchsize=4, the result is indeed worse than batchsize=1, Best Avg. Dice is only 0.650.
Next, I will change the kernel size, batchsize and learning rate to see how the training results are
you may need to look deeply into the data and select the data that is good for training
How do you usually do this part of the work?
Good question, for example, if you only see really small lesions in the lung lobe, maybe it is really difficult for the model to learn such small lesions. You can start with the subjects to have medium/large lesion label first and see if the model can learn or not.
Good question, for example, if you only see really small lesions in the lung lobe, maybe it is really difficult for the model to learn such small lesions. You can start with the subjects to have medium/large lesion label first and see if the model can learn or not.
Ok, I selected samples with a nodule diameter of 5mm or more for training, and set kernel_size=5 and batchsize=1, and the following problems occurred during training:
`Loading dataset: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 631/631 [1:02:16<00:00, 5.92s/it]
Loading dataset: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 70/70 [05:18<00:00, 4.54s/it]
Chosen Network Architecture: 3DUXNET
Loss for training: DiceCELoss
Optimizer for training: AdamW, learning rate: 1e-05
Maximum Iterations for training: 40000
Training (238 / 40000 Steps) (loss=1.26932): 38%|███████████████████████████████████████▍ | 239/631 [03:19<05:26, 1.20it/s]
Traceback (most recent call last):
File "/home/3DUX-Net/main_train.py", line 257, in
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
When I reset the kernel_size to 7, the same data and batchsize, the following problems will appear in the middle of the training
`Loading dataset: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 631/631 [28:22<00:00, 2.70s/it]
Loading dataset: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 70/70 [03:19<00:00, 2.85s/it]
Chosen Network Architecture: 3DUXNET
Loss for training: DiceCELoss
Optimizer for training: AdamW, learning rate: 0.0001
Maximum Iterations for training: 40000
Training (238 / 40000 Steps) (loss=1.43159): 38%|███████████████████████████████████████▍ | 239/631 [03:19<05:26, 1.20it/s]
Traceback (most recent call last):
File "/home/project/3DUX-Net/main_train.py", line 257, in
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
right, the kernel_size changes to 5.
I train with kernel_size=5, the loss curve during training is not ideal, and the dice is only 0.68+
Hi, I'm very interested in your conversation. I study airway segmentation in medical imaging, and I want to use this network as my baseline, but right now I can only segment the main part of the airway, and I think it's related to the small parts. I understand that your label is also very small, so I would like to ask you how to modify the preprocessing, parameters, etc. Looking forward to your reply. Thank you 🙏🙏🙏
I am closing the older bug reports as these were missed. We are now better tracking reports across the organization. Please re-open if this continues to be a blocker.