Dynamic Z-score normalisation and mixed resolution training for sensor agnostic models
Summary
Add two transforms to TorchGeo that enable sensor-agnostic training and inference:
- Dynamic Z-Score Normalisation: Normalises each image patch independently, removing the need for pre-computed dataset statistics. Applied during both training and inference.
- Mixed Resolution Training: Randomly resamples training batches to different spatial resolutions, teaching models to handle multiple sensor resolutions.
This enables a single model to work across different satellites, processing levels, and resolutions without retraining. Reduces computational overhead and enables faster inference through downscaling.
The effectiveness of these approaches have been demonstrated in the OmniCloudMask and OmniWaterMask papers where Sentinel-2-trained models achieved strong performance on data from other sensors.
Rationale
Current remote sensing workflows require training separate models for each satellite sensor, spatial resolution, and processing level. This creates significant operational overhead, researchers and practitioners must maintain multiple models, collect sensor-specific training data, and compute normalisation statistics for each dataset.
When applying models across different sensors (e.g., a Sentinel-2 trained model to Landsat data), performance typically degrades substantially because models are sensitive to differences in spectral response, spatial resolution, and data processing levels.
These limitations become problematic in operational settings where:
- Multiple satellite data sources need to be processed consistently
- Quick deployment to new sensors is required without retraining
- Training data only exists for a particular sensor but the model needs to work on others
The proposed transforms address these pain points by enabling sensor-agnostic models that maintain performance across different satellites, resolutions, and processing levels without requiring sensor-specific training data or normalisation statistics.
Implementation
The OmniCloudMask repo contains fastai implementations of these approaches that can serve as reference.
Dynamic Z-Score Normalisation Transform:
- Accept no_data_value parameter (default 0)
- Compute per-channel statistics excluding no-data pixels
- Apply normalisation independently to each image and channel in batch
Mixed Resolution Training Transform:
- Parameters: min_scale, max_scale, image_interpolation_modes (list), clip_to_min_size
- Sample random scale factor per batch during training
- Apply same scale to image and mask (for segmentation model tasks)
- Use appropriate interpolation: random selection from list for images, nearest for masks
- Optional size clipping for consistent batch dimensions
I'm happy to contribute implementation and tests if the maintainers are interested.
Alternatives
No response
Additional information
No response
- Dynamic Z-Score Normalisation Transform: definitely interested in this. While you could argue that this applies to all of computer vision, not just remote sensing, I'm not sure if Kornia/torchvision would be interested in such a transform. Let's add it to TorchGeo first.
- Mixed Resolution Training Transform: how is this different from
kornia.augmentation.RandomResizedCrop?
Hi Adam, thanks for the fast reply!
Correct me if I'm wrong, but RandomResizedCrop takes for example an input image with a size of 512x512, selects a random crop of say 256x256, then interpolates the crop back to the original image size, decreasing the spatial context and stretching each pixel, right?
The proposed mixed resolution augmentation would take the same 512x512 image and downsample the entire image to 256x256, effectively halving the spatial resolution, like turning 10m pixels into 20m pixels. So you keep the same geographic extent but simulate what happens when you move from high-resolution to lower-resolution sensors.
Admittedly, this is a bit odd since the spatial dimensions of the batch change during augmentation. For many models that handle variable spatial dimensions this works fine, but for some transformer models it may not, which is why I suggested the optional clipping to enforce consistent output size.
Does that make sense?
Ah, so it's more like kornia.augmentation.Resize, but randomly applied on some images, not all. @isaaccorley does Kornia or torchvision have any kind of "randomly apply some augmentations" wrapper we could use?
Yeah exactly, for OmniCloudMask we applied this at the batch level to ensure all items in the batch ended up with the same spatial dimensions after the transform. We applied it to all batches so every resolution from 9m to 50m was evenly sampled. We chose this approach because it was straightforward to implement and it's a relatively cheap transform to apply to a batch on the GPU. This approach results in the spatial dimensions changing from batch to batch.
An alternative approach would be to simulate a range of resolutions within a single batch, however this would require clipping the samples within a batch to the smallest (lowest resolution) size, which would lose some spatial extent for the larger (higher resolution) samples.
I don't think Kornia has a random resize but torchvision does https://docs.pytorch.org/vision/0.22/generated/torchvision.transforms.v2.RandomResize.html. Could use RandomApply to then have a list of interpolation methods as well.