3DUX-Net icon indicating copy to clipboard operation
3DUX-Net copied to clipboard

ValueError: y_pred and y should have same shapes

Open MOMOANNIE opened this issue 1 year ago • 18 comments

Hello , because my label is very small, I modify the data transformation method of flare data, change source_key="image" in CropForegroundd to source_key='label', when I trained, the shapes of y and y_pred are inconsistent. How should I solve this problem?

 elif dataset == 'flare':
        train_transforms = Compose(
            [
                LoadImaged(keys=["image", "label"]),
                AddChanneld(keys=["image", "label"]),
                Spacingd(keys=["image", "label"], pixdim=(
                    1.0, 1.0, 1.2), mode=("bilinear", "nearest")),
                # ResizeWithPadOrCropd(keys=["image", "label"], spatial_size=(256,256,128), mode=("constant")),
                Orientationd(keys=["image", "label"], axcodes="RAS"),
                ScaleIntensityRanged(
                    keys=["image"], a_min=-125, a_max=275,
                    b_min=0.0, b_max=1.0, clip=True,
                ),
                CropForegroundd(keys=["image", "label"], source_key='label', select_fn=lambda x: x > 0, margin=0),
                RandCropByPosNegLabeld(
                    keys=["image", "label"],
                    label_key="label",
                    spatial_size=(96, 96, 96),
                    pos=1,
                    neg=1,
                    num_samples=crop_samples,
                    allow_smaller=True,
                ),
                RandShiftIntensityd(
                    keys=["image"],
                    offsets=0.10,
                    prob=0.50,
                ),
                RandAffined(
                    keys=['image', 'label'],
                    mode=('bilinear', 'nearest'),
                    prob=1.0, spatial_size=(96, 96, 96),
                    rotate_range=(0, 0, np.pi / 30),
                    scale_range=(0.1, 0.1, 0.1)),
                ToTensord(keys=["image", "label"]),
            ]
        )

        val_transforms = Compose(
            [
                LoadImaged(keys=["image", "label"]),
                AddChanneld(keys=["image", "label"]),
                Spacingd(keys=["image", "label"], pixdim=(
                    1.0, 1.0, 1.2), mode=("bilinear", "nearest")),
                Orientationd(keys=["image", "label"], axcodes="RAS"),
                ScaleIntensityRanged(
                    keys=["image"], a_min=-125, a_max=275,
                    b_min=0.0, b_max=1.0, clip=True,
                ),
                CropForegroundd(keys=["image", "label"], source_key='label', select_fn=lambda x: x > 0, margin=0),
                ToTensord(keys=["image", "label"]),
            ]
        )

        test_transforms = Compose(
            [
                LoadImaged(keys=["image"]),
                AddChanneld(keys=["image"]),
                Spacingd(keys=["image"], pixdim=(
                    1.0, 1.0, 1.2), mode=("bilinear")),
                # ResizeWithPadOrCropd(keys=["image"], spatial_size=(168,168,128), mode=("constant")),
                Orientationd(keys=["image"], axcodes="RAS"),
                ScaleIntensityRanged(
                    keys=["image"], a_min=-125, a_max=275,
                    b_min=0.0, b_max=1.0, clip=True,
                ),
                CropForegroundd(keys=["image"], source_key="image"),
                ToTensord(keys=["image"]),
            ]
        )

The error that occurs is as follows:

Traceback (most recent call last):
  File "/home/3DUX-Net/main_train.py", line 257, in <module>
    global_step, dice_val_best, global_step_best = train(
  File "/home/3DUX-Net/main_train.py", line 217, in train
    dice_val = validation(epoch_iterator_val)
  File "/home/3DUX-Net/main_train.py", line 170, in validation
    dice_metric(y_pred=val_output_convert, y=val_labels_convert)
  File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/metrics/metric.py", line 329, in __call__
    ret = super().__call__(y_pred=y_pred, y=y)
  File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/metrics/metric.py", line 68, in __call__
    return self._compute_list(y_pred, y)
  File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/metrics/metric.py", line 90, in _compute_list
    ret = [self._compute_tensor(p.detach().unsqueeze(0), y_.detach().unsqueeze(0)) for p, y_ in zip(y_pred, y)]
  File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/metrics/metric.py", line 90, in <listcomp>
    ret = [self._compute_tensor(p.detach().unsqueeze(0), y_.detach().unsqueeze(0)) for p, y_ in zip(y_pred, y)]
  File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/metrics/meandice.py", line 81, in _compute_tensor
    return compute_meandice(
  File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/metrics/meandice.py", line 136, in compute_meandice
    raise ValueError(f"y_pred and y should have same shapes, got {y_pred.shape} and {y.shape}.")
ValueError: y_pred and y should have same shapes, got torch.Size([1, 2, 138, 62, 145]) and torch.Size([1, 2, 138, 62, 130]).

MOMOANNIE avatar Jul 11 '23 06:07 MOMOANNIE

Hi, sorry for the late reply from the previous thread. When I look into the screenshot of both the image and label, seems like they are not in the same dimensions. Can you first make sure that they have same size and can overlay to each other? The above error is telling that your predicted output do not have the same size with your label. As the predicted output is having the same size with the input image, so the size of input image and the corresponding label are not matched.

leeh43 avatar Jul 11 '23 20:07 leeh43

Hi, sorry for the late reply from the previous thread. When I look into the screenshot of both the image and label, seems like they are not in the same dimensions. Can you first make sure that they have same size and can overlay to each other? The above error is telling that your predicted output do not have the same size with your label. As the predicted output is having the same size with the input image, so the size of input image and the corresponding label are not matched.

Ok, I checked the image and label size of the validation set, they are both the same size and can overlap each other image image

It can pass the training normally, but when it enters the validation to calculate dice_metric, it reports an error image

MOMOANNIE avatar Jul 12 '23 01:07 MOMOANNIE

yes, because dice is to compute the overlapping ratio between the predictions and the corresponding ground truth label. If the image dimension is 387 x 387 x 491, the output should also have the same size of 387 x 387 x 491, instead of 138 x 62 x 145.

If your label is too small, please use monai.transforms.RandCropd. More details can be found here: https://docs.monai.io/en/stable/transforms.html

leeh43 avatar Jul 12 '23 03:07 leeh43

yes, because dice is to compute the overlapping ratio between the predictions and the corresponding ground truth label. If the image dimension is 387 x 387 x 491, the output should also have the same size of 387 x 387 x 491, instead of 138 x 62 x 145.

If your label is too small, please use monai.transforms.RandCropd. More details can be found here: https://docs.monai.io/en/stable/transforms.html

Haha, I found out where the mistake is, the number of my labels is different from the number of images. I'm training normally now. looking forward to training results

MOMOANNIE avatar Jul 12 '23 05:07 MOMOANNIE

Great, let's see if the results are making sense or not. Feel free to ask questions here and I will try to reply you ASAP.

leeh43 avatar Jul 12 '23 20:07 leeh43

Great, let's see if the results are making sense or not. Feel free to ask questions here and I will try to reply you ASAP.

Haha, ok, thank you very much for your patience in answering my question, the result when batchsize=1 is not very good (Best Avg. Dice is only 0.68+), I will now set the batchsize to 4, and see if the training result will be good

MOMOANNIE avatar Jul 13 '23 02:07 MOMOANNIE

Seems like you are segmenting lesions, which is pretty small, maybe using a small kernel size is better, you may try kernel size = 5.

leeh43 avatar Jul 13 '23 02:07 leeh43

Seems like you are segmenting lesions, which is pretty small, maybe using a small kernel size is better, you may try kernel size = 5. Yes, I am segmenting the nodules, and some nodules are very small. This is the loss curve when I set the batchsize to 4 image In the UXNET class of network.backbone.py, set the kernel_size of all encoders and decoders to 2? As shown in the figure below image

MOMOANNIE avatar Jul 13 '23 06:07 MOMOANNIE

For the encoder here, it is used to transfer the high level details for decoding, instead of focusing on getting meaningful features. The meaningful features are extracted in the convolution block (ux_block) in uxnet_encoder.py.

I am wondering how many samples do you use for training, validation and testing?

leeh43 avatar Jul 13 '23 06:07 leeh43

For the encoder here, it is used to transfer the high level details for decoding, instead of focusing on getting meaningful features. The meaningful features are extracted in the convolution block (ux_block) in uxnet_encoder.py.

I am wondering how many samples do you use for training, validation and testing?

Ok, is the kernel_size here changed from 7 to 5? image

I use 631 cases of 3D data as the training set, 70 cases in the validation set, and 230 cases in the test set

MOMOANNIE avatar Jul 13 '23 07:07 MOMOANNIE

right, the kernel_size changes to 5.

Also, you have 230 cases in the test set, maybe a lot of outliers there, as the groundtruth label is so small.

leeh43 avatar Jul 13 '23 07:07 leeh43

right, the kernel_size changes to 5.

OK, I'll try changing it to 5 later

Also, you have 230 cases in the test set, maybe a lot of outliers there, as the groundtruth label is so small.

Yes, the test set may have a lot of outliers, and some lesions may only be about 3mm in diameter. I haven’t used the test set yet, and I will test it after training.

I set the batchsize to 4, and the loss curve is not very ideal

image image

MOMOANNIE avatar Jul 13 '23 07:07 MOMOANNIE

Yes, you can see your training curve is pretty fluctuated, which means it is not easy to learn and maybe a lot of variations exist in your training samples. Instead of just changing the batch size, you also need to decrease the learning rate.

However, it is just hyperparameter tuning. If you want to make things more efficient, you may need to look deeply into the data and select the data that is good for training, instead of directly throwing 600 scans for training. More data doesn't mean it is learnable for models. First "artificial", then "intelligence".

leeh43 avatar Jul 13 '23 08:07 leeh43

Yes, you can see your training curve is pretty fluctuated, which means it is not easy to learn and maybe a lot of variations exist in your training samples. Instead of just changing the batch size, you also need to decrease the learning rate.

Yes, I just ran batchsize=4, the result is indeed worse than batchsize=1, Best Avg. Dice is only 0.650. image image

Next, I will change the kernel size, batchsize and learning rate to see how the training results are

you may need to look deeply into the data and select the data that is good for training

How do you usually do this part of the work?

MOMOANNIE avatar Jul 17 '23 01:07 MOMOANNIE

Good question, for example, if you only see really small lesions in the lung lobe, maybe it is really difficult for the model to learn such small lesions. You can start with the subjects to have medium/large lesion label first and see if the model can learn or not.

leeh43 avatar Jul 17 '23 08:07 leeh43

Good question, for example, if you only see really small lesions in the lung lobe, maybe it is really difficult for the model to learn such small lesions. You can start with the subjects to have medium/large lesion label first and see if the model can learn or not.

Ok, I selected samples with a nodule diameter of 5mm or more for training, and set kernel_size=5 and batchsize=1, and the following problems occurred during training:

`Loading dataset: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 631/631 [1:02:16<00:00, 5.92s/it] Loading dataset: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 70/70 [05:18<00:00, 4.54s/it] Chosen Network Architecture: 3DUXNET Loss for training: DiceCELoss Optimizer for training: AdamW, learning rate: 1e-05 Maximum Iterations for training: 40000 Training (238 / 40000 Steps) (loss=1.26932): 38%|███████████████████████████████████████▍ | 239/631 [03:19<05:26, 1.20it/s] Traceback (most recent call last): File "/home/3DUX-Net/main_train.py", line 257, in global_step, dice_val_best, global_step_best = train( File "/home/3DUX-Net/main_train.py", line 197, in train for step, batch in enumerate(epoch_iterator): File "/home/.local/lib/python3.9/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 681, in next data = self._next_data() File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data return self._process_data(data) File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data data.reraise() File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/transforms/transform.py", line 89, in apply_transform return _apply_transform(transform, data, unpack_items) File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/transforms/transform.py", line 53, in _apply_transform return transform(parameters) File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/transforms/croppad/dictionary.py", line 1171, in call self.randomize(label, fg_indices, bg_indices, image) File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/transforms/croppad/dictionary.py", line 1153, in randomize self.centers = generate_pos_neg_label_crop_centers( File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/transforms/utils.py", line 497, in generate_pos_neg_label_crop_centers raise ValueError("No sampling location available.") ValueError: No sampling location available.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/data/dataset.py", line 97, in getitem return self._transform(index) File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/data/dataset.py", line 807, in _transform data = apply_transform(_transform, data) File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/transforms/transform.py", line 113, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.croppad.dictionary.RandCropByPosNegLabeld object at 0x7fc7fc33c880> `

When I reset the kernel_size to 7, the same data and batchsize, the following problems will appear in the middle of the training `Loading dataset: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 631/631 [28:22<00:00, 2.70s/it] Loading dataset: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 70/70 [03:19<00:00, 2.85s/it] Chosen Network Architecture: 3DUXNET Loss for training: DiceCELoss Optimizer for training: AdamW, learning rate: 0.0001 Maximum Iterations for training: 40000 Training (238 / 40000 Steps) (loss=1.43159): 38%|███████████████████████████████████████▍ | 239/631 [03:19<05:26, 1.20it/s] Traceback (most recent call last): File "/home/project/3DUX-Net/main_train.py", line 257, in global_step, dice_val_best, global_step_best = train( File "/home/project/3DUX-Net/main_train.py", line 197, in train for step, batch in enumerate(epoch_iterator): File "/home/.local/lib/python3.9/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 681, in next data = self._next_data() File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data return self._process_data(data) File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data data.reraise() File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 1. Original Traceback (most recent call last): File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/transforms/transform.py", line 89, in apply_transform return _apply_transform(transform, data, unpack_items) File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/transforms/transform.py", line 53, in _apply_transform return transform(parameters) File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/transforms/croppad/dictionary.py", line 1171, in call self.randomize(label, fg_indices, bg_indices, image) File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/transforms/croppad/dictionary.py", line 1153, in randomize self.centers = generate_pos_neg_label_crop_centers( File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/transforms/utils.py", line 497, in generate_pos_neg_label_crop_centers raise ValueError("No sampling location available.") ValueError: No sampling location available.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/data/dataset.py", line 97, in getitem return self._transform(index) File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/data/dataset.py", line 807, in _transform data = apply_transform(_transform, data) File "/home/local/miniconda/envs/uxnet/lib/python3.9/site-packages/monai/transforms/transform.py", line 113, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.croppad.dictionary.RandCropByPosNegLabeld object at 0x7fc270415880> `

MOMOANNIE avatar Jul 19 '23 02:07 MOMOANNIE

right, the kernel_size changes to 5.

I train with kernel_size=5, the loss curve during training is not ideal, and the dice is only 0.68+ image image

MOMOANNIE avatar Jul 20 '23 05:07 MOMOANNIE

Hi, I'm very interested in your conversation. I study airway segmentation in medical imaging, and I want to use this network as my baseline, but right now I can only segment the main part of the airway, and I think it's related to the small parts. I understand that your label is also very small, so I would like to ask you how to modify the preprocessing, parameters, etc. Looking forward to your reply. Thank you 🙏🙏🙏

chaixiaoyi2 avatar Dec 06 '23 12:12 chaixiaoyi2

I am closing the older bug reports as these were missed. We are now better tracking reports across the organization. Please re-open if this continues to be a blocker.

BennettLandman avatar Aug 01 '24 16:08 BennettLandman