torchgeo
torchgeo copied to clipboard
Create test data on the fly
Summary
In order to test our datasets, datamodules, and trainers, we created fake test data for all of our datasets. Later, we started adding data.py scripts to generate this data. We should consider generating these files only at test time to keep the repo small.
Rationale
At this point, the number of fake dataset files (985) greatly outweighs the number of files in TorchGeo itself (190). As long as files are only created once (class scope), it shouldn't be too time consuming to generate them on the fly and keep the repo minimal.
Implementation
We already have data.py files for most datasets, all we need to do is convert these to pytest fixtures. By creating commonly used file creation methods for things like writing GeoTIFFs, PNG, JPEG, HDF5, etc. file types, we can greatly simplify the work of creating fake data without so much code duplication.
Alternatives
The alternative is to keep using data.py files. This allows us to easily regenerate the data (for example, with larger file sizes) but leads to a lot of code duplication and file storage.
Additional information
No response