harmonica icon indicating copy to clipboard operation
harmonica copied to clipboard

Better default for window_size in EquivalentSourcesGB

Open leouieda opened this issue 11 months ago • 0 comments

Description of the desired feature:

The window_size in gradient-boosted equivalent sources currently defaults to 5 km. This would completely break for problems that have very large or very small areas. We used because we needed a default but this is not ideal.

A better default would be to estimate a square window where there will be about 5k data points on average. 5k data can fit on most computers RAM so it seems like a sensible default. Being conservative here means that we won't get memory errors from numpy in the majority of cases. In this case, the default would be window_size=None and in .fit we estimate a default value with:

if self.window_size is None:
    area = (self.region_[1] - self.region_[0]) * (self.region_[3] - self.region_[2])
    ndata = data.size
    points_per_m2 = ndata / area
    window_area = 5e3 / points_per_m2
    self.window_size_ = np.sqrt(window_area)
else:
    self.window_size_ = self.window_size

And we use self.window_size_ internally.

As with #424, I also think this is OK to break compatibility without going through the hassle of warning/deprecation. But will do it if others think it's needed.

Are you willing to help implement and maintain this feature?

Yes, but happy to let others do it since my time is limited.

leouieda avatar Jul 27 '23 10:07 leouieda