torchgeo
torchgeo copied to clipboard
Optional arguments to override RasterDataset's filenames
A project that we are collaborating on has a scenario here where it would be very useful to be able to set the filename_glob and filename_regex arguments to a RasterDataset subclass when it's being initialised, rather than always depend on them being set as class attributes.
In our scenario we could be creating a PairedDataset (a RasterDataset subclass) out of many different, user-supplied RasterDataset types:
return PairedDataset(
dataset_class,
root=root,
transforms=_transformations,
**subdataset_params["params"],
)
This PR suggests a change to replace the class attributes for filename_glob and filename_regex with optional arguments, if they're set, plus a small associated test.
@microsoft-github-policy-service agree company="Ordnance Survey"
We've actually had a lot of users ask for this feature. I guess my only question is, why stop at RasterDataset.filename_*? I'm wondering if other users will request the same thing for other attributes or classes.
My original thinking when I designed this was that RasterDataset and friends were basically private base classes used to define the rest of our datasets that users actually use. Since then, a lot of users have found RasterDataset useful for generic raster data and have started using that instead of defining new subclasses.
I'd have to think more about your use case, but I think it's still possible to create subclasses on-the-fly with the new filename_* attributes, but I imagine it would be easier to be able to pass these in as arguments instead. I'll have to think about whether or not a large redesign is required if we want to support this more generally for all GeoDatasets.
Thank you for the response @adamjstewart . In our case we have a RasterDataset subclass which wraps dynamically around multiple instances of another one, which has the class attributes set as you'd expect.
You can see the current workaround for this issue here, if you're looking for usage flavours: https://github.com/Pale-Blue-Dot-97/Minerva/blob/bc6aa3433336a715d7bb0ef41404b6fe1c6773a4/minerva/datasets.py#L230
I can see the rationale for "why only filename_*" prompting contemplation of a redesign, this is intended as the smallest non-harmful change...
Hey @metazool, did #1442 fix this for you?
Ping @metazool
Is this still a work in progress? We're preparing for a new release and trying to close out old issues/PRs.
Thank you for following this up, I was on sabbatical during the original earlier pings and am no longer working for the org supporting the torchgeo dependent project, though i do still follow its changes. It might be worth checking quickly with @Pale-Blue-Dot-97 whether it's still a problem, but i'm in principle happy to close this with no action