torchgeo icon indicating copy to clipboard operation
torchgeo copied to clipboard

Create test data on the fly

Open adamjstewart opened this issue 3 years ago • 0 comments

Summary

In order to test our datasets, datamodules, and trainers, we created fake test data for all of our datasets. Later, we started adding data.py scripts to generate this data. We should consider generating these files only at test time to keep the repo small.

Rationale

At this point, the number of fake dataset files (985) greatly outweighs the number of files in TorchGeo itself (190). As long as files are only created once (class scope), it shouldn't be too time consuming to generate them on the fly and keep the repo minimal.

Implementation

We already have data.py files for most datasets, all we need to do is convert these to pytest fixtures. By creating commonly used file creation methods for things like writing GeoTIFFs, PNG, JPEG, HDF5, etc. file types, we can greatly simplify the work of creating fake data without so much code duplication.

Alternatives

The alternative is to keep using data.py files. This allows us to easily regenerate the data (for example, with larger file sizes) but leads to a lot of code duplication and file storage.

Additional information

No response

adamjstewart avatar Jul 02 '22 00:07 adamjstewart