torchgeo icon indicating copy to clipboard operation
torchgeo copied to clipboard

Optional arguments to override RasterDataset's filenames

Open metazool opened this issue 2 years ago • 5 comments

A project that we are collaborating on has a scenario here where it would be very useful to be able to set the filename_glob and filename_regex arguments to a RasterDataset subclass when it's being initialised, rather than always depend on them being set as class attributes.

In our scenario we could be creating a PairedDataset (a RasterDataset subclass) out of many different, user-supplied RasterDataset types:

return PairedDataset(
                dataset_class,
                root=root,
                transforms=_transformations,
                **subdataset_params["params"],
            )

This PR suggests a change to replace the class attributes for filename_glob and filename_regex with optional arguments, if they're set, plus a small associated test.

metazool avatar Aug 17 '23 08:08 metazool

@microsoft-github-policy-service agree company="Ordnance Survey"

metazool avatar Aug 17 '23 08:08 metazool

We've actually had a lot of users ask for this feature. I guess my only question is, why stop at RasterDataset.filename_*? I'm wondering if other users will request the same thing for other attributes or classes.

My original thinking when I designed this was that RasterDataset and friends were basically private base classes used to define the rest of our datasets that users actually use. Since then, a lot of users have found RasterDataset useful for generic raster data and have started using that instead of defining new subclasses.

I'd have to think more about your use case, but I think it's still possible to create subclasses on-the-fly with the new filename_* attributes, but I imagine it would be easier to be able to pass these in as arguments instead. I'll have to think about whether or not a large redesign is required if we want to support this more generally for all GeoDatasets.

adamjstewart avatar Aug 18 '23 15:08 adamjstewart

Thank you for the response @adamjstewart . In our case we have a RasterDataset subclass which wraps dynamically around multiple instances of another one, which has the class attributes set as you'd expect.

You can see the current workaround for this issue here, if you're looking for usage flavours: https://github.com/Pale-Blue-Dot-97/Minerva/blob/bc6aa3433336a715d7bb0ef41404b6fe1c6773a4/minerva/datasets.py#L230

I can see the rationale for "why only filename_*" prompting contemplation of a redesign, this is intended as the smallest non-harmful change...

metazool avatar Aug 22 '23 16:08 metazool

Hey @metazool, did #1442 fix this for you?

calebrob6 avatar Oct 13 '23 20:10 calebrob6

Ping @metazool

adamjstewart avatar Jan 15 '24 15:01 adamjstewart

Is this still a work in progress? We're preparing for a new release and trying to close out old issues/PRs.

adamjstewart avatar Aug 06 '24 12:08 adamjstewart

Thank you for following this up, I was on sabbatical during the original earlier pings and am no longer working for the org supporting the torchgeo dependent project, though i do still follow its changes. It might be worth checking quickly with @Pale-Blue-Dot-97 whether it's still a problem, but i'm in principle happy to close this with no action

metazool avatar Aug 06 '24 12:08 metazool