torchgeo icon indicating copy to clipboard operation
torchgeo copied to clipboard

USAVars Augmentation maps to 0

Open nilsleh opened this issue 2 years ago • 4 comments

Description

In the USAVars Datamodule, the default augmentation from NonGeoDatamodule is used. However, the dataset returns uint8 data, and it comes out of the augmentation still as uint8. This means you get an error when trying to train but also that your input images are just all zeros.

Steps to reproduce

dm = USAVarsDataModule(root="path/to/usa_vars", batch_size=16)
dm.setup("fit")
dl = dm.train_dataloader()
batch = next(iter(dl))
aug_batch = dm.aug(batch)
print(aug_batch["image"].max())

Version

'0.5.0.dev0'

nilsleh avatar Jun 20 '23 13:06 nilsleh

Also not sure to what extent other current datasets might have the same issue because it's a silent bug.

nilsleh avatar Jun 20 '23 14:06 nilsleh

This is another reason I want #985. Although I don't think rasterio can statically infer the type of the file, so maybe that wouldn't help. We'd have to carefully annotate each read.

adamjstewart avatar Jun 20 '23 16:06 adamjstewart

I am reopening this because I had two other people tell me they had the same thing happening with their custom datasets. Is there maybe a way we could implement a warning at some stage? https://github.com/microsoft/torchgeo/blob/4da988a9112b72f328e5bee6490e8f7ad3a784f0/torchgeo/transforms/transforms.py#L83 Or some other check, because it's a quiet sneaky phenomenon, that takes some digging to understand why.

nilsleh avatar Jun 22 '23 12:06 nilsleh

We could certainly warn, but I want to make sure warnings are uncommon. I also really want to get rid of our AugmentationSequential wrapper and instead use the Kornia one. The biggest thing holding us back is https://github.com/kornia/kornia/issues/2119. I started working on this but quickly found it more difficult than I expected.

I think one of the following solutions would be fine:

  • Implement https://github.com/kornia/kornia/issues/2119 and drop our AugmentationSequential
  • Drop our AugmentationSequential and use data_keys everywhere
  • Remove type conversion from our AugmentationSequential and force datasets/datamodules/trainers to handle it
  • Add a warning when we convert dtypes

There are roughly sorted in order from "this is what I want long-term" to "this would be tolerable short-term".

adamjstewart avatar Jun 22 '23 21:06 adamjstewart