MONAI icon indicating copy to clipboard operation
MONAI copied to clipboard

Add a warning for input file shape mismatches

Open MathijsdeBoer opened this issue 2 years ago • 15 comments

Description

When images loaded by LoadImaged do not have the same shape, there is no way of knowing untill this causes issues down the line, which may not happen untill the end of training. With the new default settings of the __init__ function, the warning is shown once for each input path when not all loaded images in a dict are of equal size. This can also be disabled if the user is aware of such a possibility, by setting the argument log_warning to False.

Rationale

I once had to debug a shape mismatch between my image and label inputs where one of my ~60 patients had a faulty label image. This didn't cause issues untill the RandCropByPosNegLabeld transform that I was using threw a shape mismatch error. That on it's own is not a bad hint to the user that something's wrong with the input data, but I initially suspected some of my other augmentation transforms.

Furthermore, this error would not always happen at the same time (due to the shuffling of the training set), not at the same stage (train/val split) or sometimes not until after training when the faulty patient ended up in the test-set. That would be another clue to the fact that one of my entries is wrong, but it would have saved me a few days of debugging if the LoadImaged transform would have just told me about this potential issue from the word go instead.

Status

Work in Progress

Types of changes

  • [x] Non-breaking change (fix or new feature that would not break existing functionality).
  • [ ] Breaking change (fix or new feature that would cause existing functionality to change).
  • [ ] New tests added to cover the changes.
  • [x] Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • [x] Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • [x] In-line docstrings updated.
  • [ ] Documentation updated, tested make html command in the docs/ folder.

Not a breaking change, no new tests available for a print statement, make html did not work on my system. I might be missing some bit of software for that.

MathijsdeBoer avatar May 03 '22 16:05 MathijsdeBoer

I tried to perform the commands exactly as given by your DCO thing, but now I seem to have accidentally signed off on too many commits.

Edit: Except the one I was trying to sign off in the first damn place!

MathijsdeBoer avatar May 03 '22 22:05 MathijsdeBoer

I think you can locally revert to an earlier state with git reflog good luck...

wyli avatar May 03 '22 22:05 wyli

There we go, I guess this is a good a crash course for Git than any.

I ended up following this gist, and that worked out.

MathijsdeBoer avatar May 03 '22 23:05 MathijsdeBoer

Looks like a good addition to me! Thoughts @rijobro @Nic-Ma ?

ericspod avatar May 04 '22 11:05 ericspod

I can certainly see why this would be useful in your use case. But I think the name should be something more specific than log_warning; it's a specific kind of warning, so maybe more like check_image_sizes_match.

From the code (and apologies if I'm wrong on this), I think you check that all dimensions match each other, which would raise warnings if the number of channels between the image and label do not match.

I'm not sure if the default should be True. It's true that if it's set to False then users may not realise they have a problem. But if we set it to True, users with images and labels that are of different sizes intentionally will start seeing warnings.

rijobro avatar May 04 '22 11:05 rijobro

Hi @rijobro @ericspod ,

I agree with your comment, maybe we can call the arg align_spatial_shape? And actually, I feel maybe it's more suitable to put this check in EnsureChannelFirstd transform instead of LoadImaged because images may have unknown dims in LoadImaged transform. What do you think?

Thanks.

Nic-Ma avatar May 04 '22 11:05 Nic-Ma

Honestly, I'm not entirely sure it belongs in any of our transforms. Perhaps we could have a checklist suggesting debugging steps for users, and this could be one of the points?

rijobro avatar May 04 '22 12:05 rijobro

Some good points RE: where this warning should live (if at all), I had indeed forgotten about the potential existance/lack of a channel dim. I'm not sure if a checklist will ever be as valuable as an automatic warning that also provides the user with the exact entry that's causing issues, though.

I do think a form of input checking should be in there somewhere, but if it's not in one of the transforms, I'm not entirely sure where it would fit nicely. Your DataSet classes could handle this on initialization, but that would lock this check to your custom DataSet classes, and would not be usable for custom implementations. Because the transforms in MONAI handle a lot of data wrangling, including the I/O, this does seem to me like the most logical place to fit it in.

An alternative idea I had is to create a new transform, maybe something like CheckDataSanityd, to handle this. But this would be a transform that doesn't actually transform, which might not be preferable. That was a main reason why I plugged it into the LoadImaged transform instead.

MathijsdeBoer avatar May 04 '22 13:05 MathijsdeBoer

Yes I certainly see your point that if we put it in a checklist, less people benefit from it than if we place it in our code. You're right, maybe we just need to put in the warning ...if you're expecting this, disable this warning with sanity_check=False.

We probably also want to check for any(np.isnan(x)), as NaN's can often create a problem for users and the cause isn't always immediately obvious.

edit: Sorry I just noticed that you already have If you do not wish to see this warning, pass log_warning = False to LoadImaged().

rijobro avatar May 04 '22 13:05 rijobro

I'm not sure if the default should be True. It's true that if it's set to False then users may not realise they have a problem. But if we set it to True, users with images and labels that are of different sizes intentionally will start seeing warnings.

I might have to default this to False?

Some CI builds are failing because some of the tests seem to be passing loading dict objects instead of NDArrayOrTensor objects. Those obviously do not have a .shape value for me to collect.

Setting the bool to False would succeed these builds, but I'm not sure that these dicts are intended to appear there in the first place.

This only seems to be failing in some build configurations, though?

MathijsdeBoer avatar May 04 '22 16:05 MathijsdeBoer

Well, I broke the DCO again, but I'd really rather not go back to this mess of accidental branches: image

MathijsdeBoer avatar May 04 '22 16:05 MathijsdeBoer

/black

there's only a merging commit unsigned, all the code changes are signed, so I set DCO check to pass manually.

wyli avatar May 05 '22 05:05 wyli

/build

wyli avatar May 05 '22 06:05 wyli

I want to add some code that checks shapes when the amount of dimensions (image with channel, label without channel) do not match, before I feel comfortable finishing up my pull request.

MathijsdeBoer avatar May 05 '22 07:05 MathijsdeBoer

I want to add some code that checks shapes when the amount of dimensions (image with channel, label without channel) do not match, before I feel comfortable finishing up my pull request.

sorry I thought this PR is ready... I'll remove the approval for now and wait for your further updates.

wyli avatar May 05 '22 10:05 wyli