Why Sampler Index intersection with roi
Why do the Samplers call self.index.intersection(tuple(self.roi), objects=True): (line 116, single.py)?
This seems to be called in GeoSampler as well which they all derive but with the dataset.index.
I'll have to think about this more, but I think you're right that it isn't strictly necessary to do this twice. If I recall correctly, the original reason was to avoid code duplication. The intersections in GeoSampler and RandomGeoSampler actually do slightly different things.
GeoSampler: reduce the total number of entries in the index, remove parts of entries that are outside of ROIRandomGeoSampler: find entries with area larger than patch size
The latter can't be done in GeoSampler because it only works for datasets with area > 0, it doesn't work for point data like GBIF/EDDMapS/iNaturalist (recently added). But I think the former could be done in RandomGeoSampler at the same time, it just results in code duplication.
I'll keep this in mind, we have a couple open PRs to refactor the samplers a bit.
I think the intersect call can be removed from RandomGeoSampler (and the others) as the size check can be performed on self.index directly as it was initalized in GeoSampler.
So iterating over self.index to find all hits that are larger than the patch size.