odeon
odeon copied to clipboard
feature: generic dataset
Pitch
For SSL experiments, we have implemented specific data loaders for several multimodal datasets in our code base.
To limit redundancy, we have designed an abstract dataset class that performs generic operations such as (i) preprocessing of the different modalities and (ii) data augmenting the different modalites with a shared augmentation.
(i) The preprocessing includes the following steps (these steps are configured using data classes at the level of the dataset, and at the level of each modality):
- Sampling a crop part inside the geographical boundaries of each image/patch
- Sampling a date range for the multi-temporal modalities with:
start_mod_date = self.rng.integers(low=0, high=sizes_date[name_mod] % mod.num_dates + 1)
# where sizes_date[name_mod] is the total number of available dates
# and mod.num_dates is the number of time steps in the modality-specific time discretization
- And for each modality:
- Reading the sampled crop part (extension can be either ".tif", ".png", ".jpg", ".jpeg", ".npy", ".h5")
- If multiple dates are concatenated along the band axis in a tif file, reshaping the array
- Reading a modality-specific selection of bands in the array
- Setting array values as NaNs where the cloud mask probability is higher than a modality-specific threshold
- Creating a time discretization based on a dataset-specific function if the array is multi-temporal
- Applying a log scale transform if a modality-specific boolean is true
- Normalizing the array based on a modality-specific constant
(ii) The data augmentation is simply a D4 transform shared across all modalities. It would be useful if additional options were available.