pygmt
pygmt copied to clipboard
Handling data processing functions that output to a grid or table
Description of the issue
In the GMT command-line world, there are some data processing functions that can output to either a NetCDF grid or ASCII table. Translating to Python/PyGMT, do we want to 1) have a single function that can output to both (depending on some flag), or 2) have two functions/methods, one which outputs to a grid, and one which outputs to a table.
This is a list of functions that need to be handled:
- [x]
triangulate#731 - [x]
grdhisteq#1433 - [ ] etc
Originally posted by @weiji14 in https://github.com/GenericMappingTools/pygmt/issues/1433#issuecomment-923441121
I changed the implementation a bit relative to #731 to support ASCII or pandas.DataFrame output for writing out the equalized histogram.
Still, the code is a bit clunky in order to support four different output types (pandas.DataFrame, xarray.DataArray, netCDF, or ASCII). What would you think about having two PyGMT functions for GMT's grdhisteq module rather than just one? One function could write out the data ranges of histogram equalization to a pd.DataFrame or ASCII table and the other could write out the cumulative distribution statistics to a netCDF file or xarray.DataArray. I guess coming up with the names for these would be harder than the current implementation, but I think it would be more user friendly long-term.
Yeah I've debated a bit on whether to have 2 functions too, something like a
pygmt.grdhisteq.to_table()andpygmt.grdhisteq.to_grid()(implemented using Python classmethods), or maybe with an underscore likepygmt.grdhisteq_to_table()andpygmt.grdhisteq_to_grid()(implemented purely using Python functions). Tying this to https://github.com/GenericMappingTools/pygmt/issues/1318#issuecomment-855317785, I think the split into 2 may have to happen eventually, especially if we want to support more table-like outputs (ascii/numpy/pandas/geopandas/etc) like what Will is doing atgrd2xyz#1284.
Possible implementation styles
These are how the implementation would look like, using triangulate as an example.
Single function
def triangulate(data, outgrid=None, outfile=None):
pass
Two Python functions
Have a common _triangulate function that handles grid or table outputs, some similarities to the _blockm.
def _triangulate(data, outgrid=None, outfile=None):
pass
def triangulate_to_grid(data, outgrid=None):
pass
def triangulate_to_table(data, outfile=None):
pass
Two methods in a single Python class
class triangulate:
def _triangulate():
pass
@staticmethod
def to_grid(data, outgrid=None):
pass
@staticmethod
def to_table(data, outfile=None):
pass
Are you willing to help implement and maintain this feature? Vote for which API style you prefer!
- A. :+1: Single function to do both grid/table output, i.e.
pygmt.triangulate(outgrid=True)orpygmt.triangulate(outfile=True) - B. :tada: The 'functional' style, i.e.
pygmt.triangulate_to_grid()orpygmt.triangulate_to_table() - C. :rocket: The 'class' method style, i.e.
pygmt.triangulate.to_grid()orpygmt.triangulate.to_table() - D. :eyes: Other suggestions on the names, or API design, please comment below!
P.S. Also xref #896 where there is a similar API design discussion on wrapping GMT functions that do either plotting or data processing.
I like the syntax of the class method style, but dislike using classes in a functional programming style with the staticmethod decorator. I would also prefer for the function/method names to be more descriptive regarding the output than 'to_table' or 'to_grid'. For example, triangulate.find_voronoi_edges or triangulate.find_delauney_edges or triangulate.grid_data.
In https://github.com/GenericMappingTools/pygmt/tree/grdhisteq-functions, I tried to implement a syntax similar to the class based option and the pygmt.datasets.load_* functions while still keeping the design functional. The functions work and I think the syntax is actually quite user-friendly, however, I could not get the import statements working for autodoc. Any advice here would be appreciated. I mixed up merge commits and needed to close #1571 due to divergence with the grdhisteq branch. I could either discard the class-based design, discard the functional design, or open a different PR from grdhisteq-functions with main as the target branch to compare the two options.
The implementation of grdhisteq in https://github.com/GenericMappingTools/pygmt/pull/1433 uses the "Two methods in a single Python class" style and is currently on a final review call. If that PR gets merged, I think we should stick with that style for the other functions that output to a grid or table for consistency. So, please comment either here or in that PR if anyone does not like that design choice.
Thanks Meghan for getting the grdhisteq function done. I'll refactor the triangulate implementation in #731 to use a similar "Two methods in a single Python class" style to be consistent.
Just a note that this issue can be closed after the recommended structure (as used in grdhisteq and triangulate) is added to the contributing guide. The guidance could be added as a follow-up to https://github.com/GenericMappingTools/pygmt/pull/1687.
Just a note that this issue can be closed after the recommended structure (as used in grdhisteq and triangulate) is added to the contributing guide.
The contributing guide is already too long, do you think we should add a code style guide instead? Here is an example from ObsPy https://docs.obspy.org/coding_style.html.
Just a note that this issue can be closed after the recommended structure (as used in grdhisteq and triangulate) is added to the contributing guide.
The contributing guide is already too long, do you think we should add a code style guide instead? Here is an example from ObsPy https://docs.obspy.org/coding_style.html.
Yes, a code style guide would be a good alternative. We could also move a bunch of the other information into a docs style guide if it's important to have the base contributing guide shorter.